☆ 4.7 Article

Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons

GIGASCIENCE (2019)

期刊

GIGASCIENCE

卷 8, 期 2, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/gigascience/giy165

关键词

metagenomics; Hadoop; k-mer; distance metrics; clustering

类别

Biology Multidisciplinary Sciences

资金

National Science Foundation [1640775]
Office of Advanced Cyberinfrastructure (OAC)
Direct For Computer & Info Scie & Enginr [1640775] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.

Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons

期刊

GIGASCIENCE

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons

期刊

GIGASCIENCE

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文