4.6 Article

A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes

Journal

BMC BIOINFORMATICS
Volume 17, Issue -, Pages -

Publisher

BIOMED CENTRAL LTD
DOI: 10.1186/s12859-016-1186-3

Keywords

Metagenomics; Binning; Unsupervised learning; Environmental genomics; Microbiome; Sequence clustering

Funding

  1. French Alternative Energies and Atomic Energy Commission (Commissariat a l'Energie Atomique et aux Energies Alternatives), through its transversal programme Technologies pour la Sante [TS_METATARGET]

Ask authors/readers for more resources

Background: Metagenomics holds great promises for deepening our knowledge of key bacterial driven processes, but metagenome assembly remains problematic, typically resulting in representation biases and discarding significant amounts of non-redundant sequence information. In order to alleviate constraints assembly can impose on downstream analyses, and/or to increase the fraction of raw reads assembled via targeted assemblies relying on pre-assembly binning steps, we developed a set of binning modules and evaluated their combination in a new assembly-free binning protocol. Results: We describe a scalable multi-tiered binning algorithm that combines frequency and compositional features to cluster unassembled reads, and demonstrate i) significant runtime performance gains of the developed modules against state of the art software, obtained through parallelization and the efficient use of large lock-free concurrent hash maps, ii) its relevance for clustering unassembled reads from high complexity (e.g., harboring 700 distinct genomes) samples, iii) its relevance to experimental setups involving multiple samples, through a use case consisting in the de novo identification of sequences from a target genome (e.g., a pathogenic strain) segregating at low levels in a cohort of 50 complex microbiomes (harboring 100 distinct genomes each), in the background of closely related strains and the absence of reference genomes, iv) its ability to correctly identify clusters of sequences from the E. coli O104:H4 genome as the most strongly correlated to the infection status in 53 microbiomes sampled from the 2011 STEC outbreak in Germany, and to accurately cluster contigs of this pathogenic strain from a cross-assembly of these 53 microbiomes. Conclusions: We present a set of sequence clustering (binning) modules and their application to biomarker (e.g., genomes of pathogenic organisms) discovery from large synthetic and real metagenomics datasets. Initially designed for the assembly-free analysis of individual metagenomic samples, we demonstrate their extension to setups involving multiple samples via the usage of the alignment-free d(2)S statistic to relate clusters across samples, and illustrate how the clustering modules can otherwise be leveraged for de novo pre-assembly tasks by segregating sequences into biologically meaningful partitions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Multidisciplinary Sciences

Characterization of cassava ORANGE proteins and their capability to increase provitamin A carotenoids accumulation

Angelica M. Jaramillo, Santiago Sierra, Paul Chavarriaga-Aguirre, Diana Katherine Castillo, Anestis Gkanogiannis, Luis Augusto Becerra Lopez-Lavalle, Juan Pablo Arciniegas, Tianhu Sun, Li Li, Ralf Welsch, Erick Boy, Daniel Alvarez

Summary: Cassava biofortification with provitamin A carotenoids is a process aimed at alleviating vitamin A deficiency. A special protein called ORANGE protein (OR) has been found to play a role in carotenoid biosynthesis and stability in cassava. The expression and protein levels of OR and phytoene synthase (PSY) were evaluated in different cassava cultivars, and it was found that OR protein levels were higher in yellow cultivars, while PSY expression was higher in yellow cultivars but protein levels remained the same. Overexpression of one variant of OR greatly increased carotenoid levels in cassava. These findings contribute to the understanding of carotenoid accumulation in cassava.

PLOS ONE (2022)

Article Biochemistry & Molecular Biology

Methylation in the CHH Context Allows to Predict Recombination in Rice

Mauricio Penuela, Jenny Johana Gallo-Franco, Jorge Finke, Camilo Rocha, Anestis Gkanogiannis, Thaura Ghneim-Herrera, Mathias Lorieux

Summary: This study investigates the association between recombination rates and DNA methylation in two commercial rice varieties, revealing negative and positive correlations between them, as well as the significant impact of the centromere region. Furthermore, machine learning regression models are developed to predict recombination rates using methylated cytosines in the CHH context.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Article Plant Sciences

The South Green portal: a comprehensive resource for tropical and Mediterranean crop genomics

Valentin Guignon, Yann Hueber, Mathieu Rouard, Stephanie Bocs, David Couvin, Frederic De Lamotte, Gaetan Droc, Jean-Francois Dufayard, Nordine El Hassouni, Cedric Farcy, Anestis Gkanogiannis, Chantal Hamelin, Delphine Lariviere, Guillaume Martin, Enrique Ortega, Bertrand Pitollat, Stephanie Pointet, Manuel Ruiz, Gautier Sarah, Marilyne Summo, Sebastien Ravel, Pierre Larmande, Cecile Monat, Francois Sabot, Ndomassi Tando, Christine Tranchant-Dubreuil, Guilhem Sempere, Alexis Dereeper

CURRENT PLANT BIOLOGY (2016)

No Data Available