4.6 Article

Phylogeny-corrected identification of microbial gene families relevant to human gut colonization

Journal

PLOS COMPUTATIONAL BIOLOGY
Volume 14, Issue 8, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1006242

Keywords

-

Funding

  1. NSF [DMS-1069303, DMS-1563159]
  2. Gordon & Betty Moore Foundation [3300]
  3. Gladstone Institutes

Ask authors/readers for more resources

The mechanisms by which different microbes colonize the healthy human gut versus other body sites, the gut in disease states, or other environments remain largely unknown. Identifying microbial genes influencing fitness in the gut could lead to new ways to engineer probiotics or disrupt pathogenesis. We approach this problem by measuring the statistical association between a species having a gene and the probability that the species is present in the gut microbiome. The challenge is that closely related species tend to be jointly present or absent in the microbiome and also share many genes, only a subset of which are involved in gut adaptation. We show that this phylogenetic correlation indeed leads to many false discoveries and propose phylogenetic linear regression as a powerful solution. To apply this method across the bacterial tree of life, where most species have not been experimentally phenotyped, we use metagenomes from hundreds of people to quantify each species' prevalence in and specificity for the gut microbiome. This analysis reveals thousands of genes potentially involved in adaptation to the gut across species, including many novel candidates as well as processes known to contribute to fitness of gut bacteria, such as acid tolerance in Bacteroidetes and sporulation in Firmicutes. We also find microbial genes associated with a preference for the gut over other body sites, which are significantly enriched for genes linked to fitness in an in vivo competition experiment. Finally, we identify gene families associated with higher prevalence in patients with Crohn's disease, including Proteobacterial genes involved in conjugation and fimbria regulation, processes previously linked to inflammation. These gene targets may represent new avenues for modulating host colonization and disease. Our strategy of combining metagenomics with phylogenetic modeling is general and can be used to identify genes associated with adaptation to any environment.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biotechnology & Applied Microbiology

A unified catalog of 204,938 reference genomes from the human gut microbiome

Alexandre Almeida, Stephen Nayfach, Miguel Boland, Francesco Strozzi, Martin Beracochea, Zhou Jason Shi, Katherine S. Pollard, Ekaterina Sakharova, Donovan H. Parks, Philip Hugenholtz, Nicola Segata, Nikos C. Kyrpides, Robert D. Finn

Summary: The Unified Human Gastrointestinal Genome (UHGG) and Protein (UHGP) collections include a large number of non-redundant genomes and protein sequences, which are crucial for studying the relationship between genotypes and phenotypes in the human gut microbiome.

NATURE BIOTECHNOLOGY (2021)

Article Microbiology

Influence of the polar light cycle on seasonal dynamics of an Antarctic lake microbial community

Pratibha Panwar, Michelle A. Allen, Timothy J. Williams, Alyce M. Hancock, Sarah Brazendale, James Bevington, Simon Roux, David Paez-Espino, Stephen Nayfach, Maureen Berg, Frederik Schulz, I-Min A. Chen, Marcel Huntemann, Nicole Shapiro, Nikos C. Kyrpides, Tanja Woyke, Emiley A. Eloe-Fadrosh, Ricardo Cavicchioli

MICROBIOME (2020)

Article Biotechnology & Applied Microbiology

CheckV assesses the quality and completeness of metagenome-assembled viral genomes

Stephen Nayfach, Antonio Pedro Camargo, Frederik Schulz, Emiley Eloe-Fadrosh, Simon Roux, Nikos C. Kyrpides

Summary: CheckV is an automated pipeline for identifying closed viral genomes, estimating completeness of genome fragments, and removing host-related regions. By comparing sequences with a large database of complete viral genomes, CheckV assesses the quality of viral genomes assembled from metagenome data. Removing host contamination improves the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.

NATURE BIOTECHNOLOGY (2021)

Article Multidisciplinary Sciences

Ecology and molecular targets of hypermutation in the global microbiome

Simon Roux, Blair G. Paul, Sarah C. Bagby, Stephen Nayfach, Michelle A. Allen, Graeme Attwood, Ricardo Cavicchioli, Ludmila Chistoserdova, Robert J. Gruninger, Steven J. Hallam, Maria E. Hernandez, Matthias Hess, Wen-Tso Liu, Tim A. McAllister, Michelle A. O'Malley, Xuefeng Peng, Virginia Rich, Scott R. Saleska, Emiley A. Eloe-Fadrosh

Summary: Researchers have analyzed over 30,000 DGRs from public metagenomes, establishing six major DGR lineages, three of which are primarily encoded by phages, and demonstrating that DGRs are responsible for a significant proportion of amino acid changes in some organisms. These results highlight the constraints under which DGRs evolve, and reveal the distinct roles that these elements play in natural communities.

NATURE COMMUNICATIONS (2021)

Article Microbiology

Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome

Stephen Nayfach, David Paez-Espino, Lee Call, Soo Jen Low, Hila Sberro, Natalia N. Ivanova, Amy D. Proal, Michael A. Fischbach, Ami S. Bhatt, Philip Hugenholtz, Nikos C. Kyrpides

Summary: By mining deposited human stool metagenomes, nearly 190,000 draft-quality DNA virus genomes were recovered to create the Metagenomic Gut Virus catalogue, improving virus detection in stool metagenomes and revealing diverse retroelements with potential involvement in the molecular arms race between phages and their bacterial hosts.

NATURE MICROBIOLOGY (2021)

Article Ecology

Dissecting the dominant hot spring microbial populations based on community-wide sampling at single-cell genomic resolution

Robert M. Bowers, Stephen Nayfach, Frederik Schulz, Sean P. Jungbluth, Ilona A. Ruhl, Andriy Sheremet, Janey Lee, Danielle Goudeau, Emiley A. Eloe-Fadrosh, Ramunas Stepanauskas, Rex R. Malmstrom, Nikos C. Kyrpides, Peter F. Dunfield, Tanja Woyke

Summary: Advancements in single-cell genomics have enabled rapid and affordable sequencing of microbial communities, providing a comprehensive snapshot of community composition and function. This approach also allows for the direct linkage of mobile elements to hosts and analysis of population heterogeneity among dominant community members.

ISME JOURNAL (2022)

Article Biotechnology & Applied Microbiology

Fast and accurate metagenotyping of the human gut microbiome with GT-Pro

Zhou Jason Shi, Boris Dimitrov, Chunyu Zhao, Stephen Nayfach, Katherine S. Pollard

Summary: The GenoTyper for Prokaryotes (GT-Pro) was developed to efficiently catalog SNPs from genomes and genotype them from metagenomes, providing a faster and more accurate alternative to alignment-based methods. This approach enables fast and memory-efficient metagenotyping of millions of SNPs on a personal computer, reducing computational time by two orders of magnitude.

NATURE BIOTECHNOLOGY (2022)

Article Biochemical Research Methods

MIDAS2: Metagenomic Intra-species Diversity Analysis System

Chunyu Zhao, Boris Dimitrov, Miriam Goldman, Stephen Nayfach, Katherine S. Pollard

Summary: The Metagenomic Intra-Species Diversity Analysis System (MIDAS) is a scalable metagenomic pipeline that identifies single nucleotide variants (SNVs) and gene copy number variants in microbial populations. MIDAS2 addresses computational challenges of large reference genome databases and allows custom database building and improved SNV accuracy with paired-end reads. This fast and scalable improvement of the MIDAS pipeline enables efficient genotyping of thousands of metagenomic samples.

BIOINFORMATICS (2023)

Article Microbiology

Host and gut bacteria share metabolic pathways for anti-cancer drug metabolism

Peter Spanogiannopoulos, Than S. Kyaw, Ben G. H. Guthrie, Patrick H. Bradley, Joyce Lee, Jonathan Melamed, Ysabella Noelle Amora Malig, Kathy N. Lam, Daryll Gempis, Moriah Sandy, Wesley Kidder, Erin L. Van Blarigan, Chloe E. Atreya, Alan Venook, Roy R. Gerona, Andrei Goga, Katherine S. Pollard, Peter J. Turnbaugh

Summary: Anti-cancer fluoropyrimidine drugs have antibacterial effects on the gut microbiome, and these drugs can be metabolized by gut bacteria via conserved pathways also found in mammalian hosts.

NATURE MICROBIOLOGY (2022)

Article Biochemistry & Molecular Biology

IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata

Antonio Pedro Camargo, Stephen Nayfach, I-Min A. Chen, Krishnaveni Palaniappan, Anna Ratner, Ken Chu, Stephan J. Ritter, T. B. K. Reddy, Supratim Mukherjee, Frederik Schulz, Lee Call, Russell Y. Neches, Tanja Woyke, Natalia N. Ivanova, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, Simon Roux

Summary: Viruses play critical roles in all microbiomes and their genomic diversity and impacts on biological processes are extensively explored through metagenomics. IMG/VR is a platform providing access to a large collection of viral sequences along with functional annotation and metadata. The latest version, IMG/VR v4, contains over 15 million virus genomes and genome fragments.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria

Simon E. Roux, Antonio Pedro Camargo, Felipe H. Coutinho, Shareef M. Dabdoub, Bas E. Dutilh, Stephen Nayfach, Andrew Tritt

Summary: The extraordinary diversity of viruses infecting bacteria and archaea can be studied through metagenomics. However, metagenome-derived viral sequences lack key information about host association. We introduce iPHoP, a framework that integrates multiple methods to predict host taxonomy for a range of uncultivated viruses, while maintaining a low false discovery rate. Using a large dataset from the IMG/VR database, iPHoP shows promise in providing extensive host prediction for uncultivated viruses.

PLOS BIOLOGY (2023)

Article Biochemical Research Methods

Identifying species-specific k-mers for fast and accurate metagenotyping with Maast and GT-Pro

Zhou Jason Shi, Stephen Nayfach, Katherine S. Pollard

Summary: This protocol describes a computational pipeline for fast and accurate SNP genotyping using metagenomic data, which includes generating a SNP catalog, extracting unique SNP-covering k-mer sequences, and performing metagenotyping to achieve strain-level quantification.

STAR PROTOCOLS (2023)

Article Cell Biology

Expanding the genomic encyclopedia of Actinobacteria with 824 isolate reference genomes

Rekha Seshadri, Simon Roux, Katharina J. Huber, Dongying Wu, Sora Yu, Dan Udwary, Lee Call, Stephen Nayfach, Richard L. Hahnke, Rudiger Pukall, James R. White, Neha J. Varghese, Cody Webb, Krishnaveni Palaniappan, Lorenz C. Reimer, Joaquim Sarda, Jonathon Bertsch, Supratim Mukherjee, T. B. K. Reddy, Patrick P. Hajek, Marcel Huntemann, I-Min A. Chen, Alex Spunde, Alicia Clum, Nicole Shapiro, Zong-Yen Wu, Zhiying Zhao, Yuguang Zhou, Lyudmila Evtushenko, Sofie Thijs, Vincent Stevens, Emiley A. Eloe-Fadrosh, Nigel J. Mouncey, Yasuo Yoshikuni, William B. Whitman, Hans-Peter Klenk, Tanja Woyke, Markus Goeker, Nikos C. Kyrpides, Natalia N. Ivanova

Summary: The study presents a comprehensive analysis of actinobacterial diversity, showing that only a small portion of this diversity is represented by sequenced genomes. The comparison of gene functions reveals novel determinants of host-microbe interaction and environment-specific adaptations. The analysis of biosynthetic gene clusters highlights the role of horizontal gene transfer and gene loss in shaping secondary metabolite repertoire.

CELL GENOMICS (2022)

Article Biochemistry & Molecular Biology

IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses

Simon Roux, David Paez-Espino, I-Min A. Chen, Krishna Palaniappan, Anna Ratner, Ken Chu, T. B. K. Reddy, Stephen Nayfach, Frederik Schulz, Lee Call, Russell Y. Neches, Tanja Woyke, Natalia N. Ivanova, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides

Summary: Viruses play a crucial role in shaping microbial community structures and driving global nutrient cycles by infecting cellular hosts. The IMG/VR database offers access to a vast collection of viral sequences, providing insights into viral genome diversity.

NUCLEIC ACIDS RESEARCH (2021)

Article Biotechnology & Applied Microbiology

A genomic catalog of Earth's microbiomes

Stephen Nayfach, Simon Roux, Rekha Seshadri, Daniel Udwary, Neha Varghese, Frederik Schulz, Dongying Wu, David Paez-Espino, I-Min Chen, Marcel Huntemann, Krishna Palaniappan, Joshua Ladau, Supratim Mukherjee, T. B. K. Reddy, Torben Nielsen, Edward Kirton, Jose P. Faria, Janaka N. Edirisinghe, Christopher S. Henry, Sean P. Jungbluth, Dylan Chivian, Paramvir Dehal, Elisha M. Wood-Charlson, Adam P. Arkin, Susannah G. Tringe, Axel Visel, Tanja Woyke, Nigel J. Mouncey, Natalia N. Ivanova, Nikos C. Kyrpides, Emiley A. Eloe-Fadrosh

Summary: Reconstructing bacterial and archaeal genomes from shotgun metagenomes has led to the creation of a comprehensive catalog representing a significant expansion of the known phylogenetic diversity of bacteria and archaea. This resource is available for streamlined comparative analyses, interactive exploration, metabolic modeling, and bulk download, demonstrating the utility of genome-centric approaches for understanding genomic properties of uncultivated microorganisms.

NATURE BIOTECHNOLOGY (2021)

No Data Available