CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment
Events
Loading … Spinner

Mendeley | Further Information

{"title"=>"Clustom-Cloud: In-Memory data grid-based software for clustering 16S rRNA sequence data in the cloud environment", "type"=>"journal", "authors"=>[{"first_name"=>"Jeongsu", "last_name"=>"Oh", "scopus_author_id"=>"23478094100"}, {"first_name"=>"Chi Hwan", "last_name"=>"Choi", "scopus_author_id"=>"55698050300"}, {"first_name"=>"Min Kyu", "last_name"=>"Park", "scopus_author_id"=>"7404490480"}, {"first_name"=>"Byung Kwon", "last_name"=>"Kim", "scopus_author_id"=>"34769539400"}, {"first_name"=>"Kyuin", "last_name"=>"Hwang", "scopus_author_id"=>"55674299000"}, {"first_name"=>"Sang Heon", "last_name"=>"Lee", "scopus_author_id"=>"55925082500"}, {"first_name"=>"Soon Gyu", "last_name"=>"Hong", "scopus_author_id"=>"7405764719"}, {"first_name"=>"Arshan", "last_name"=>"Nasir", "scopus_author_id"=>"54787967600"}, {"first_name"=>"Wan Sup", "last_name"=>"Cho", "scopus_author_id"=>"7401774688"}, {"first_name"=>"Kyung Mo", "last_name"=>"Kim", "scopus_author_id"=>"36101192800"}], "year"=>2016, "source"=>"PLoS ONE", "identifiers"=>{"scopus"=>"2-s2.0-84961154581", "doi"=>"10.1371/journal.pone.0151064", "sgr"=>"84961154581", "pmid"=>"26954507", "issn"=>"19326203", "pui"=>"609034763"}, "id"=>"875e06c6-79d6-3984-b54c-a04874139f6c", "abstract"=>"High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.", "link"=>"http://www.mendeley.com/research/clustomcloud-inmemory-data-gridbased-software-clustering-16s-rrna-sequence-data-cloud-environment", "reader_count"=>15, "reader_count_by_academic_status"=>{"Professor > Associate Professor"=>8, "Researcher"=>2, "Student > Ph. D. Student"=>2, "Student > Master"=>2, "Student > Bachelor"=>1}, "reader_count_by_user_role"=>{"Professor > Associate Professor"=>8, "Researcher"=>2, "Student > Ph. D. Student"=>2, "Student > Master"=>2, "Student > Bachelor"=>1}, "reader_count_by_subject_area"=>{"Unspecified"=>1, "Engineering"=>1, "Environmental Science"=>1, "Agricultural and Biological Sciences"=>1, "Social Sciences"=>1, "Computer Science"=>8, "Immunology and Microbiology"=>1, "Economics, Econometrics and Finance"=>1}, "reader_count_by_subdiscipline"=>{"Engineering"=>{"Engineering"=>1}, "Social Sciences"=>{"Social Sciences"=>1}, "Immunology and Microbiology"=>{"Immunology and Microbiology"=>1}, "Economics, Econometrics and Finance"=>{"Economics, Econometrics and Finance"=>1}, "Agricultural and Biological Sciences"=>{"Agricultural and Biological Sciences"=>1}, "Computer Science"=>{"Computer Science"=>8}, "Unspecified"=>{"Unspecified"=>1}, "Environmental Science"=>{"Environmental Science"=>1}}, "group_count"=>0}

Scopus | Further Information

{"@_fa"=>"true", "link"=>[{"@_fa"=>"true", "@ref"=>"self", "@href"=>"https://api.elsevier.com/content/abstract/scopus_id/84961154581"}, {"@_fa"=>"true", "@ref"=>"author-affiliation", "@href"=>"https://api.elsevier.com/content/abstract/scopus_id/84961154581?field=author,affiliation"}, {"@_fa"=>"true", "@ref"=>"scopus", "@href"=>"https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84961154581&origin=inward"}, {"@_fa"=>"true", "@ref"=>"scopus-citedby", "@href"=>"https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=84961154581&origin=inward"}], "prism:url"=>"https://api.elsevier.com/content/abstract/scopus_id/84961154581", "dc:identifier"=>"SCOPUS_ID:84961154581", "eid"=>"2-s2.0-84961154581", "dc:title"=>"Clustom-Cloud: In-Memory data grid-based software for clustering 16S rRNA sequence data in the cloud environment", "dc:creator"=>"Oh J.", "prism:publicationName"=>"PLoS ONE", "prism:eIssn"=>"19326203", "prism:volume"=>"11", "prism:issueIdentifier"=>"3", "prism:pageRange"=>nil, "prism:coverDate"=>"2016-03-01", "prism:coverDisplayDate"=>"March 2016", "prism:doi"=>"10.1371/journal.pone.0151064", "citedby-count"=>"5", "affiliation"=>[{"@_fa"=>"true", "affilname"=>"Korea Research Institute of Bioscience and Biotechnology", "affiliation-city"=>"Yusong", "affiliation-country"=>"South Korea"}], "pubmed-id"=>"26954507", "prism:aggregationType"=>"Journal", "subtype"=>"ar", "subtypeDescription"=>"Article", "article-number"=>"e0151064", "source-id"=>"10600153309", "openaccess"=>"1", "openaccessFlag"=>true}

Facebook

  • {"url"=>"http%3A%2F%2Fjournals.plos.org%2Fplosone%2Farticle%3Fid%3D10.1371%252Fjournal.pone.0151064", "share_count"=>0, "like_count"=>0, "comment_count"=>0, "click_count"=>0, "total_count"=>0}

Counter

  • {"month"=>"3", "year"=>"2016", "pdf_views"=>"41", "xml_views"=>"0", "html_views"=>"646"}
  • {"month"=>"4", "year"=>"2016", "pdf_views"=>"21", "xml_views"=>"1", "html_views"=>"122"}
  • {"month"=>"5", "year"=>"2016", "pdf_views"=>"13", "xml_views"=>"0", "html_views"=>"60"}
  • {"month"=>"6", "year"=>"2016", "pdf_views"=>"8", "xml_views"=>"0", "html_views"=>"102"}
  • {"month"=>"7", "year"=>"2016", "pdf_views"=>"5", "xml_views"=>"0", "html_views"=>"38"}
  • {"month"=>"8", "year"=>"2016", "pdf_views"=>"6", "xml_views"=>"0", "html_views"=>"24"}
  • {"month"=>"9", "year"=>"2016", "pdf_views"=>"12", "xml_views"=>"0", "html_views"=>"41"}
  • {"month"=>"10", "year"=>"2016", "pdf_views"=>"8", "xml_views"=>"0", "html_views"=>"41"}
  • {"month"=>"11", "year"=>"2016", "pdf_views"=>"1", "xml_views"=>"0", "html_views"=>"35"}
  • {"month"=>"12", "year"=>"2016", "pdf_views"=>"9", "xml_views"=>"0", "html_views"=>"36"}
  • {"month"=>"1", "year"=>"2017", "pdf_views"=>"5", "xml_views"=>"0", "html_views"=>"28"}
  • {"month"=>"2", "year"=>"2017", "pdf_views"=>"1", "xml_views"=>"0", "html_views"=>"34"}
  • {"month"=>"3", "year"=>"2017", "pdf_views"=>"5", "xml_views"=>"0", "html_views"=>"35"}
  • {"month"=>"4", "year"=>"2017", "pdf_views"=>"7", "xml_views"=>"0", "html_views"=>"74"}
  • {"month"=>"5", "year"=>"2017", "pdf_views"=>"5", "xml_views"=>"0", "html_views"=>"45"}
  • {"month"=>"6", "year"=>"2017", "pdf_views"=>"3", "xml_views"=>"0", "html_views"=>"27"}
  • {"month"=>"7", "year"=>"2017", "pdf_views"=>"16", "xml_views"=>"3", "html_views"=>"30"}
  • {"month"=>"8", "year"=>"2017", "pdf_views"=>"3", "xml_views"=>"2", "html_views"=>"32"}
  • {"month"=>"9", "year"=>"2017", "pdf_views"=>"2", "xml_views"=>"1", "html_views"=>"28"}
  • {"month"=>"10", "year"=>"2017", "pdf_views"=>"5", "xml_views"=>"1", "html_views"=>"55"}
  • {"month"=>"11", "year"=>"2017", "pdf_views"=>"3", "xml_views"=>"0", "html_views"=>"138"}
  • {"month"=>"12", "year"=>"2017", "pdf_views"=>"1", "xml_views"=>"1", "html_views"=>"220"}
  • {"month"=>"1", "year"=>"2018", "pdf_views"=>"4", "xml_views"=>"0", "html_views"=>"27"}
  • {"month"=>"2", "year"=>"2018", "pdf_views"=>"2", "xml_views"=>"0", "html_views"=>"7"}
  • {"month"=>"3", "year"=>"2018", "pdf_views"=>"1", "xml_views"=>"2", "html_views"=>"9"}
  • {"month"=>"4", "year"=>"2018", "pdf_views"=>"4", "xml_views"=>"0", "html_views"=>"12"}
  • {"month"=>"5", "year"=>"2018", "pdf_views"=>"4", "xml_views"=>"0", "html_views"=>"8"}
  • {"month"=>"6", "year"=>"2018", "pdf_views"=>"7", "xml_views"=>"1", "html_views"=>"15"}
  • {"month"=>"7", "year"=>"2018", "pdf_views"=>"7", "xml_views"=>"6", "html_views"=>"6"}
  • {"month"=>"8", "year"=>"2018", "pdf_views"=>"6", "xml_views"=>"1", "html_views"=>"16"}
  • {"month"=>"9", "year"=>"2018", "pdf_views"=>"8", "xml_views"=>"0", "html_views"=>"7"}
  • {"month"=>"10", "year"=>"2018", "pdf_views"=>"6", "xml_views"=>"2", "html_views"=>"4"}
  • {"month"=>"11", "year"=>"2018", "pdf_views"=>"1", "xml_views"=>"0", "html_views"=>"1"}

Figshare

  • {"files"=>["https://ndownloader.figshare.com/files/4825321"], "description"=>"<p>Running time (A) and memory usage (B) of CLUSTOM-CLOUD were measured by analyzing 50 K, 100 K, 150 K, and 200 K sequences in <i>high</i>-, <i>intermediate</i>-, and <i>low</i>-complexity datasets (3% distance threshold). The measures were repeated three times per dataset and the average values are plotted.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104161, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.g006", "stats"=>{"downloads"=>0, "page_views"=>0, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/Running_time_of_the_whole_process_according_to_the_complexity_of_the_microbial_diversity_/3104161", "title"=>"Running time of the whole process according to the complexity of the microbial diversity.", "pos_in_sequence"=>7, "defined_type"=>1, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825297"], "description"=>"<p>16S rRNA sequences in FASTA format are provided as input. Each input file, already checked for low-quality and chimera errors, is pre-processed by the removal of duplicates and transformation of <i>k</i>-mer into numeric values. A fixed number of sequence pairs are distributed to each cluster node for <i>k</i>-mer (initial) and NW (refinement) distance calculation. Processed results are merged upon completion of each unit task. Clusters are determined based on criteria described previously [<a href=\"http://www.plosone.org/article/info:doi/10.1371/journal.pone.0151064#pone.0151064.ref011\" target=\"_blank\">11</a>] and in text. Output files are created and data are cleared from memory.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104134, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.g002", "stats"=>{"downloads"=>0, "page_views"=>0, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/Schematic_diagram_of_clustering_workflow_/3104134", "title"=>"Schematic diagram of clustering workflow.", "pos_in_sequence"=>3, "defined_type"=>1, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825333"], "description"=>"<p>The clustering accuracy of CLUSTOM, CLUSTOM-CLOUD, DOTUR-AL-PSA, ESPRIT-Tree, mothur-AL-PSA, mothur-AL-MSA, UCLUST and Swarm was performed based on 16S rRNA pyrosequences of a mock community that was constructed by pooled DNA of 21 human-associated prokaryotic strains with even concentration (HMP-Mock-community). The precision and recall metrics as well as their <i>F</i><sub><i>2</i></sub> values were used to compare the clustering accuracy of the eight programs at the species (A) and genus (B) levels.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104170, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.g007", "stats"=>{"downloads"=>0, "page_views"=>0, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/Comparative_accuracy_test_of_existing_clustering_programs_/3104170", "title"=>"Comparative accuracy test of existing clustering programs.", "pos_in_sequence"=>8, "defined_type"=>1, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825255", "https://ndownloader.figshare.com/files/4825261", "https://ndownloader.figshare.com/files/4825264", "https://ndownloader.figshare.com/files/4825267"], "description"=>"<div><p>High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at <a href=\"http://clustomcloud.kopri.re.kr/\" target=\"_blank\">http://clustomcloud.kopri.re.kr</a>.</p></div>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104110, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>["https://dx.doi.org/10.1371/journal.pone.0151064.s001", "https://dx.doi.org/10.1371/journal.pone.0151064.s002", "https://dx.doi.org/10.1371/journal.pone.0151064.s003", "https://dx.doi.org/10.1371/journal.pone.0151064.s004"], "stats"=>{"downloads"=>2, "page_views"=>0, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/CLUSTOM_CLOUD_In_Memory_Data_Grid_Based_Software_for_Clustering_16S_rRNA_Sequence_Data_in_the_Cloud_Environment/3104110", "title"=>"CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment", "pos_in_sequence"=>1, "defined_type"=>4, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825303"], "description"=>"<p>The diagram summarizes the layout of the <i>k</i>-mer transformation method. (A) All <i>k</i>-mer strings in the input sequence dataset along with non-redundant numeric values are loaded into hash map. (B) All <i>k</i>-mer in each sequence are replaced with numeric values corresponding to each key in hash map.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104143, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.g003", "stats"=>{"downloads"=>0, "page_views"=>0, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/Representation_of_i_k_i_mer_transformation_method_/3104143", "title"=>"Representation of <i>k</i>-mer transformation method.", "pos_in_sequence"=>4, "defined_type"=>1, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825342"], "description"=>"<p>CLUSTOM-CLOUD running time for each step according to the complexity of the microbial diversity<a href=\"http://www.plosone.org/article/info:doi/10.1371/journal.pone.0151064#t001fn001\" target=\"_blank\"><sup>a</sup></a>.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104179, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.t001", "stats"=>{"downloads"=>0, "page_views"=>2, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/CLUSTOM_CLOUD_running_time_for_each_step_according_to_the_complexity_of_the_microbial_diversity_sup_a_sup_/3104179", "title"=>"CLUSTOM-CLOUD running time for each step according to the complexity of the microbial diversity<sup>a</sup>.", "pos_in_sequence"=>9, "defined_type"=>3, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825309"], "description"=>"<p>The figure summarizes the workflow of distributed processing in CLUSTOM-CLOUD. (A) The number of all possible sequence pairs that need to be compared for distance calculation is represented as a right-angled triangle; <i>n</i> represents the total number of sequences. (B) A chunk-size based on system granularity is determined to distribute only a fixed number of sequence pairs (shown here with 2 K) to each cluster node. (C) Each task (e.g., T<sub><i>i</i></sub>) is assigned to nodes from top to bottom and left to right. (D) Each node takes and processes tasks in the order of task priority. (E) The assigned task (T<sub><i>i</i></sub>) is divided into smaller sub-tasks (t<sub><i>j</i></sub>) and processed in parallel using multi-threads (w<sub>k</sub>) depending on the number of threads on the cluster node.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104149, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.g004", "stats"=>{"downloads"=>0, "page_views"=>0, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/Fine_grained_task_distribution_in_CLUSTOM_CLOUD_/3104149", "title"=>"Fine-grained task distribution in CLUSTOM-CLOUD.", "pos_in_sequence"=>5, "defined_type"=>1, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825315"], "description"=>"<p>Comparison of the memory usage (A) and running time (B) were performed with and without <i>k</i>-mer transformation method only at the <i>k</i>-mer distance calculation step. Two of 100K 16S sequences were independently and randomly extracted from the sequence datasets of <i>high</i>-, <i>intermediate</i>- and <i>low</i>-complexity. For each of the six different sequence datasets, the running time and memory usage were measured three times independently.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104155, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.g005", "stats"=>{"downloads"=>0, "page_views"=>0, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/Running_time_and_memory_usage_evaluation_of_the_i_k_i_mer_transformation_method_/3104155", "title"=>"Running time and memory usage evaluation of the <i>k</i>-mer transformation method.", "pos_in_sequence"=>6, "defined_type"=>1, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825351"], "description"=>"<p>Time and cost of running one million reads on CLUSTOM-CLOUD.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104188, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.t002", "stats"=>{"downloads"=>0, "page_views"=>2, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/Time_and_cost_of_running_one_million_reads_on_CLUSTOM_CLOUD_/3104188", "title"=>"Time and cost of running one million reads on CLUSTOM-CLOUD.", "pos_in_sequence"=>10, "defined_type"=>3, "published_date"=>"2016-03-08 08:20:38"}
  • {"files"=>["https://ndownloader.figshare.com/files/4825285"], "description"=>"<p>CLUSTOM-CLOUD consists of <i>Application</i> and <i>Cluster</i> units. <i>Application</i> is composed of <i>Job Tracker</i> and <i>Data Manager</i>. <i>Job Tracker</i> assigns <i>Task Tracker</i> to each <i>Cluster Node</i> in <i>Cluster</i> and checks its status. <i>Task Tracker</i> processes a distributed task in parallel using multi-threads. Data manager manages processed results and generates clustering results. <i>Cluster</i> is a set of <i>N</i>-nodes, which are unified by IMDG. <i>Cluster</i> is composed of <i>Cluster Node</i> and <i>Task Tracker</i>. A part of RAM in each <i>Cluster Node</i> is assigned to IMDG data structure and backup area.</p>", "links"=>[], "tags"=>["microbiome sequence datasets", "nod", "sequencing", "DOTUR", "laboratory", "Amazon", "EC", "CLUSTOM", "16 S rRNA sequence", "UCLUST", "IMDG", "Clustering 16 S rRNA Sequence Data", "diversity", "size 200 K", "sequence datasets", "heuristic algorithms struggle", "16 S rRNA pyrosequences", "technology", "accuracy", "JAVA", "analysis", "16 S rRNA", "OTU"], "article_id"=>3104125, "categories"=>["Genetics", "Molecular Biology", "Evolutionary Biology", "Ecology", "Immunology", "Biological Sciences not elsewhere classified", "Information Systems not elsewhere classified"], "users"=>["Jeongsu Oh", "Chi-Hwan Choi", "Min-Kyu Park", "Byung Kwon Kim", "Kyuin Hwang", "Sang-Heon Lee", "Soon Gyu Hong", "Arshan Nasir", "Wan-Sup Cho", "Kyung Mo Kim"], "doi"=>"https://dx.doi.org/10.1371/journal.pone.0151064.g001", "stats"=>{"downloads"=>0, "page_views"=>4, "likes"=>0}, "figshare_url"=>"https://figshare.com/articles/System_architecture_/3104125", "title"=>"System architecture.", "pos_in_sequence"=>2, "defined_type"=>1, "published_date"=>"2016-03-08 08:20:38"}

PMC Usage Stats | Further Information

  • {"unique-ip"=>"1", "full-text"=>"0", "pdf"=>"1", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"3"}
  • {"unique-ip"=>"18", "full-text"=>"21", "pdf"=>"6", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"1", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"4"}
  • {"unique-ip"=>"7", "full-text"=>"9", "pdf"=>"2", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"1", "cited-by"=>"0", "year"=>"2016", "month"=>"5"}
  • {"unique-ip"=>"6", "full-text"=>"11", "pdf"=>"1", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"1", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"6"}
  • {"unique-ip"=>"8", "full-text"=>"6", "pdf"=>"2", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"3", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"7"}
  • {"unique-ip"=>"5", "full-text"=>"8", "pdf"=>"1", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"1", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"8"}
  • {"unique-ip"=>"8", "full-text"=>"9", "pdf"=>"3", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"1", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"9"}
  • {"unique-ip"=>"18", "full-text"=>"18", "pdf"=>"6", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"5", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"10"}
  • {"unique-ip"=>"9", "full-text"=>"10", "pdf"=>"1", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"1", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"11"}
  • {"unique-ip"=>"6", "full-text"=>"8", "pdf"=>"0", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"2", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2016", "month"=>"12"}
  • {"unique-ip"=>"10", "full-text"=>"12", "pdf"=>"0", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"14", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"1"}
  • {"unique-ip"=>"8", "full-text"=>"6", "pdf"=>"3", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"2", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"2"}
  • {"unique-ip"=>"7", "full-text"=>"6", "pdf"=>"0", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"1", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"3"}
  • {"unique-ip"=>"9", "full-text"=>"7", "pdf"=>"3", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"4"}
  • {"unique-ip"=>"4", "full-text"=>"4", "pdf"=>"1", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"5"}
  • {"unique-ip"=>"5", "full-text"=>"4", "pdf"=>"3", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"6"}
  • {"unique-ip"=>"7", "full-text"=>"5", "pdf"=>"2", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"5", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"7"}
  • {"unique-ip"=>"5", "full-text"=>"5", "pdf"=>"0", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"8"}
  • {"unique-ip"=>"2", "full-text"=>"1", "pdf"=>"1", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"9"}
  • {"unique-ip"=>"9", "full-text"=>"9", "pdf"=>"3", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"10"}
  • {"unique-ip"=>"9", "full-text"=>"9", "pdf"=>"0", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"11"}
  • {"unique-ip"=>"10", "full-text"=>"11", "pdf"=>"3", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2017", "month"=>"12"}
  • {"unique-ip"=>"1", "full-text"=>"1", "pdf"=>"0", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"1"}
  • {"unique-ip"=>"6", "full-text"=>"4", "pdf"=>"2", "abstract"=>"0", "scanned-summary"=>"0", "scanned-page-browse"=>"0", "figure"=>"0", "supp-data"=>"0", "cited-by"=>"0", "year"=>"2018", "month"=>"3"}

Relative Metric

{"start_date"=>"2016-01-01T00:00:00Z", "end_date"=>"2016-12-31T00:00:00Z", "subject_areas"=>[]}
Loading … Spinner
There are currently no alerts