Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing
Publication Date
October 02, 2017
Authors
Yaron Orenstein, David Pellow, Guillaume Marçais, Ron Shamir, et al
Volume
13
Issue
10
Pages
e1005777
DOI
https://dx.plos.org/10.1371/journal.pcbi.1005777
Publisher URL
http://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1005777
Scopus
85031845869
Mendeley
http://www.mendeley.com/research/designing-small-universal-kmer-hitting-sets-improved-analysis-highthroughput-sequencing
Events
Loading … Spinner

Mendeley | Further Information

{"title"=>"Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing", "type"=>"journal", "authors"=>[{"first_name"=>"Yaron", "last_name"=>"Orenstein", "scopus_author_id"=>"47061955900"}, {"first_name"=>"David", "last_name"=>"Pellow", "scopus_author_id"=>"57188879660"}, {"first_name"=>"Guillaume", "last_name"=>"Marçais", "scopus_author_id"=>"26530593700"}, {"first_name"=>"Ron", "last_name"=>"Shamir", "scopus_author_id"=>"7005411902"}, {"first_name"=>"Carl", "last_name"=>"Kingsford", "scopus_author_id"=>"35611336100"}], "year"=>2017, "source"=>"PLoS Computational Biology", "identifiers"=>{"sgr"=>"85031845869", "doi"=>"10.1371/journal.pcbi.1005777", "issn"=>"15537358", "pui"=>"619072778", "isbn"=>"1111111111", "scopus"=>"2-s2.0-85031845869"}, "id"=>"3437c0c0-4511-355b-a1b0-2c259966358f", "abstract"=>"With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS) if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.", "link"=>"http://www.mendeley.com/research/designing-small-universal-kmer-hitting-sets-improved-analysis-highthroughput-sequencing", "reader_count"=>8, "reader_count_by_academic_status"=>{"Researcher"=>1, "Student > Ph. D. Student"=>4, "Other"=>1, "Student > Bachelor"=>1, "Professor"=>1}, "reader_count_by_user_role"=>{"Researcher"=>1, "Student > Ph. D. Student"=>4, "Other"=>1, "Student > Bachelor"=>1, "Professor"=>1}, "reader_count_by_subject_area"=>{"Biochemistry, Genetics and Molecular Biology"=>2, "Agricultural and Biological Sciences"=>2, "Computer Science"=>4}, "reader_count_by_subdiscipline"=>{"Agricultural and Biological Sciences"=>{"Agricultural and Biological Sciences"=>2}, "Computer Science"=>{"Computer Science"=>4}, "Biochemistry, Genetics and Molecular Biology"=>{"Biochemistry, Genetics and Molecular Biology"=>2}}, "group_count"=>0}

Scopus | Further Information

{"@_fa"=>"true", "link"=>[{"@_fa"=>"true", "@ref"=>"self", "@href"=>"https://api.elsevier.com/content/abstract/scopus_id/85031845869"}, {"@_fa"=>"true", "@ref"=>"author-affiliation", "@href"=>"https://api.elsevier.com/content/abstract/scopus_id/85031845869?field=author,affiliation"}, {"@_fa"=>"true", "@ref"=>"scopus", "@href"=>"https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85031845869&origin=inward"}, {"@_fa"=>"true", "@ref"=>"scopus-citedby", "@href"=>"https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85031845869&origin=inward"}], "prism:url"=>"https://api.elsevier.com/content/abstract/scopus_id/85031845869", "dc:identifier"=>"SCOPUS_ID:85031845869", "eid"=>"2-s2.0-85031845869", "dc:title"=>"Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing", "dc:creator"=>"Orenstein Y.", "prism:publicationName"=>"PLoS Computational Biology", "prism:issn"=>"1553734X", "prism:eIssn"=>"15537358", "prism:volume"=>"13", "prism:issueIdentifier"=>"10", "prism:pageRange"=>nil, "prism:coverDate"=>"2017-10-01", "prism:coverDisplayDate"=>"October 2017", "prism:doi"=>"10.1371/journal.pcbi.1005777", "citedby-count"=>"2", "affiliation"=>[{"@_fa"=>"true", "affilname"=>"MIT Computer Science and Artificial Intelligence Laboratory", "affiliation-city"=>"Cambridge", "affiliation-country"=>"United States"}], "pubmed-id"=>"28968408", "prism:aggregationType"=>"Journal", "subtype"=>"ar", "subtypeDescription"=>"Article", "article-number"=>"e1005777", "source-id"=>"4000151810", "openaccess"=>"1", "openaccessFlag"=>true}

Twitter

Counter

  • {"month"=>"10", "year"=>"2017", "pdf_views"=>"105", "xml_views"=>"26", "html_views"=>"797"}
  • {"month"=>"11", "year"=>"2017", "pdf_views"=>"59", "xml_views"=>"12", "html_views"=>"239"}
  • {"month"=>"12", "year"=>"2017", "pdf_views"=>"19", "xml_views"=>"3", "html_views"=>"93"}
  • {"month"=>"1", "year"=>"2018", "pdf_views"=>"35", "xml_views"=>"0", "html_views"=>"83"}
  • {"month"=>"2", "year"=>"2018", "pdf_views"=>"14", "xml_views"=>"0", "html_views"=>"53"}
  • {"month"=>"3", "year"=>"2018", "pdf_views"=>"16", "xml_views"=>"0", "html_views"=>"54"}
  • {"month"=>"4", "year"=>"2018", "pdf_views"=>"8", "xml_views"=>"0", "html_views"=>"29"}
  • {"month"=>"5", "year"=>"2018", "pdf_views"=>"3", "xml_views"=>"0", "html_views"=>"32"}
  • {"month"=>"6", "year"=>"2018", "pdf_views"=>"9", "xml_views"=>"0", "html_views"=>"39"}
  • {"month"=>"7", "year"=>"2018", "pdf_views"=>"9", "xml_views"=>"3", "html_views"=>"39"}
  • {"month"=>"8", "year"=>"2018", "pdf_views"=>"8", "xml_views"=>"2", "html_views"=>"20"}
  • {"month"=>"9", "year"=>"2018", "pdf_views"=>"13", "xml_views"=>"1", "html_views"=>"46"}
  • {"month"=>"10", "year"=>"2018", "pdf_views"=>"15", "xml_views"=>"1", "html_views"=>"40"}
  • {"month"=>"11", "year"=>"2018", "pdf_views"=>"38", "xml_views"=>"0", "html_views"=>"63"}
  • {"month"=>"12", "year"=>"2018", "pdf_views"=>"50", "xml_views"=>"0", "html_views"=>"62"}
  • {"month"=>"1", "year"=>"2019", "pdf_views"=>"5", "xml_views"=>"0", "html_views"=>"32"}
  • {"month"=>"2", "year"=>"2019", "pdf_views"=>"12", "xml_views"=>"0", "html_views"=>"45"}
  • {"month"=>"3", "year"=>"2019", "pdf_views"=>"15", "xml_views"=>"4", "html_views"=>"37"}
  • {"month"=>"4", "year"=>"2019", "pdf_views"=>"19", "xml_views"=>"0", "html_views"=>"39"}
  • {"month"=>"5", "year"=>"2019", "pdf_views"=>"18", "xml_views"=>"0", "html_views"=>"39"}
  • {"month"=>"6", "year"=>"2019", "pdf_views"=>"15", "xml_views"=>"0", "html_views"=>"47"}
  • {"month"=>"7", "year"=>"2019", "pdf_views"=>"12", "xml_views"=>"0", "html_views"=>"21"}
  • {"month"=>"8", "year"=>"2019", "pdf_views"=>"7", "xml_views"=>"0", "html_views"=>"21"}

PMC Usage Stats

  • {"unique-ip"=>"28", "full-text"=>"25", "pdf"=>"12", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"3", "supp-data"=>"1", "cited-by"=>"0", "year"=>"2017", "month"=>"11"}
  • {"unique-ip"=>"20", "full-text"=>"14", "pdf"=>"3", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"4", "supp-data"=>"4", "cited-by"=>"0", "year"=>"2017", "month"=>"12"}
  • {"unique-ip"=>"12", "full-text"=>"10", "pdf"=>"4", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"3", "supp-data"=>"1", "cited-by"=>"0", "year"=>"2018", "month"=>"1"}
  • {"unique-ip"=>"7", "full-text"=>"4", "pdf"=>"8", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"3"}
  • {"unique-ip"=>"7", "full-text"=>"10", "pdf"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"1", "cited-by"=>"0", "year"=>"2019", "month"=>"1"}
  • {"unique-ip"=>"5", "full-text"=>"5", "pdf"=>"1", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"5"}
  • {"unique-ip"=>"6", "full-text"=>"6", "pdf"=>"1", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"4"}
  • {"unique-ip"=>"4", "full-text"=>"4", "pdf"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"6"}
  • {"unique-ip"=>"9", "full-text"=>"8", "pdf"=>"2", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"1", "cited-by"=>"0", "year"=>"2018", "month"=>"7"}
  • {"unique-ip"=>"12", "full-text"=>"10", "pdf"=>"4", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"8"}
  • {"unique-ip"=>"7", "full-text"=>"5", "pdf"=>"2", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"9"}
  • {"unique-ip"=>"10", "full-text"=>"10", "pdf"=>"3", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"4", "cited-by"=>"0", "year"=>"2018", "month"=>"11"}
  • {"unique-ip"=>"13", "full-text"=>"7", "pdf"=>"8", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"10"}
  • {"unique-ip"=>"21", "full-text"=>"25", "pdf"=>"8", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"12"}
  • {"unique-ip"=>"11", "full-text"=>"10", "pdf"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"1", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2019", "month"=>"2"}
  • {"unique-ip"=>"10", "full-text"=>"10", "pdf"=>"1", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"2", "cited-by"=>"0", "year"=>"2019", "month"=>"3"}
  • {"unique-ip"=>"9", "full-text"=>"10", "pdf"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"1", "cited-by"=>"0", "year"=>"2019", "month"=>"4"}
  • {"unique-ip"=>"13", "full-text"=>"15", "pdf"=>"1", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2019", "month"=>"5"}
Loading … Spinner
There are currently no alerts