4.6 Article

Deep learning models for bacteria taxonomic classification of metagenomic data

Journal

BMC BIOINFORMATICS
Volume 19, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s12859-018-2182-6

Keywords

Metagenomic; Classification; CNN; DBN; k-mer representation; Amplicon; Shotgun

Funding

  1. CNR Interomics Flagship Project

Ask authors/readers for more resources

Background: An open challenge in translational bioinformatics is the analysis of sequenced metagenomes from various environmental samples. Of course, several studies demonstrated the 16S ribosomal RNA could be considered as a barcode for bacteria classification at the genus level, but till now it is hard to identify the correct composition of metagenomic data from RNA-seq short-read data. 16S short-read data are generated using two next generation sequencing technologies, i.e. whole genome shotgun (WGS) and amplicon (AMP); typically, the former is filtered to obtain short-reads belonging to a 16S shotgun (SG), whereas the latter take into account only some specific 16S hypervariable regions. The above mentioned two sequencing technologies, SG and AMP, are used alternatively, for this reason in this work we propose a deep learning approach for taxonomic classification of metagenomic data, that can be employed for both of them. Results: To test the proposed pipeline, we simulated both SG and AMP short-reads, from 1000 16S full-length sequences. Then, we adopted a k-mer representation to map sequences as vectors into a numerical space. Finally, we trained two different deep learning architecture, i.e., convolutional neural network (CNN) and deep belief network (DBN), obtaining a trained model for each taxon. We tested our proposed methodology to find the best parameters configuration, and we compared our results against the classification performances provided by a reference classifier for bacteria identification, known as RDP classifier. We outperformed the RDP classifier at each taxonomic level with both architectures. For instance, at the genus level, both CNN and DBN reached 91.3% of accuracy with AMP short-reads, whereas RDP classifier obtained 83.8% with the same data. Conclusions: In this work, we proposed a 16S short-read sequences classification technique based on k-mer representation and deep learning architecture, in which each taxon (from phylum to genus) generates a classification model. Experimental results confirm the proposed pipeline as a valid approach for classifying bacteria sequences; for this reason, our approach could be integrated into the most common tools for metagenomic analysis. According to obtained results, it can be successfully used for classifying both SG and AMP data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Interdisciplinary Applications

A pattern recognition approach to identify biological clusters acquired by acoustic multi-beam in Kongsfjorden

Giovanni Giacalone, Marco Barra, Angelo Bonanno, Gualtiero Basilone, Ignazio Fontana, Monica Calabro, Simona Genovese, Rosalia Ferreri, Giuseppa Buscaino, Salvatore Mazzola, Riko Noormets, Christopher Nuth, Giosue Lo Bosco, Riccardo Rizzo, Salvatore Aronica

Summary: In this study, acoustic data collected in Kongsfjorden, Svalbard, was analyzed to develop a method for identifying and classifying fish aggregations using 3D acoustic patterns. The results suggest that three distinct groups can be identified mathematically. This approach shows promise for improving monitoring programs for marine resources and can be applied to climate change research.

ENVIRONMENTAL MODELLING & SOFTWARE (2022)

Article Computer Science, Interdisciplinary Applications

Automatic classification of acoustically detected krill aggregations: A case study from Southern Ocean

Ignazio Fontana, Marco Barra, Angelo Bonanno, Giovanni Giacalone, Riccardo Rizzo, Olga Mangoni, Simona Genovese, Gualtiero Basilone, Rosalia Ferreri, Salvatore Mazzola, Giosue Lo Bosco, Salvatore Aronica

Summary: Acoustic surveys play a significant role in assessing the distribution and abundance of pelagic organisms. The identification of species in acoustic observations is usually based on biological sampling and expert knowledge. This study examines the use of unsupervised clustering methods for identifying krill species and finds that k-means performs better than hierarchical methods. The findings highlight the importance of selecting specific variables for clustering analysis to improve accuracy.

ENVIRONMENTAL MODELLING & SOFTWARE (2022)

Article Biochemistry & Molecular Biology

Muscle Histopathological Abnormalities in a Patient With a CCT5 Mutation Predicted to Affect the Apical Domain of the Chaperonin Subunit

Federica Scalia, Rosario Barone, Francesca Rappa, Antonella Marino Gammazza, Fabrizio Lo Celso, Giosue Lo Bosco, Giampaolo Barone, Vincenzo Antona, Maria Vadala, Alessandra Maria Vitale, Giuseppe Donato Mangano, Domenico Amato, Giusy Sentiero, Filippo Macaluso, Kathryn H. Myburgh, Everly Conway de Macario, Alberto J. L. Macario, Mario Giuffre, Francesco Cappello

Summary: Recognition of diseases associated with mutations of the chaperone system genes (chaperonopathies) is increasing, but the impact of the mutation on the chaperone molecule and the mechanisms underlying tissue abnormalities are not well understood. This study examined the histological features of skeletal muscle from a patient with a severe, early onset, distal motor neuropathy carrying a mutation on the CCT5 subunit (MUT). The mutated muscle showed significant modifications including fiber atrophy, disruption of tissue architecture, and apoptosis. The study also found abnormal localization and precipitation of various proteins in the mutated muscle. In silico analyses of the mutant CCT5 molecule revealed abnormalities that could impair chaperoning functions. Further in vitro and in vivo analysis of the mutated CCT5 is anticipated to provide additional insights on subunit involvement in neuromuscular disorders.

FRONTIERS IN MOLECULAR BIOSCIENCES (2022)

Article Computer Science, Software Engineering

Standard versus uniform binary search and their variants in learned static indexing: The case of the searching on sorted data benchmarking software platform

Domenico Amato, Giosue Lo Bosco, Raffaele Giancarlo

Summary: Learned Indexes use a model to restrict the search range of a sorted table, and using the SOSD benchmarking software, this study demonstrates that k-ary search is more efficient in certain computer architectures. This research provides guidelines for selecting the search routine within the learned indexing framework.

SOFTWARE-PRACTICE & EXPERIENCE (2023)

Article Biochemistry & Molecular Biology

Transcriptomic and Bioinformatic Analyses Identifying a Central Mif-Cop9-Nf-kB Signaling Network in Innate Immunity Response of Ciona robusta

Laura La Paglia, Mirella Vazzana, Manuela Mauro, Francesca Dumas, Antonino Fiannaca, Alfonso Urso, Vincenzo Arizza, Aiti Vizzini

Summary: Through bioinformatics and in vivo experiments, it was found that LPS induction activates the expression of multiple immune genes in granulocyte hemocytes, leading to the activation of the Nf-kB signaling pathway and downstream pro-inflammatory gene expression. The study reveals the evolutionarily conserved functional link between the Mif-Csn-Nf-kB axis in the ascidian C. robusta during LPS-mediated inflammation response, which is finely regulated by non-coding molecules such as microRNAs.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2023)

Article Biochemical Research Methods

GOWDL: gene ontology-driven wide and deep learning model for cell typing of scRNA-seq data

Antonino Fiannaca, Massimo La Rosa, Laura La Paglia, Salvatore Gaglio, Alfonso Urso

Summary: Single-cell RNA-sequencing enables the characterization of cell types and estimation of cell population composition. This study presents a GOWDL model, which combines gene ontology and marker genes, for cell type classification and demonstrates its effectiveness in multiple tissues.

BRIEFINGS IN BIOINFORMATICS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

A Gene Ontology-Driven Wide and Deep Learning Architecture for Cell-Type Classification from Single-Cell RNA-seq Data

Gianmarco Coppola, Antonino Fiannaca, Massimo La Rosa, Laura La Paglia, Alfonso Urso, Salvatore Gaglio

Summary: Recent advances in single-cell RNA-sequencing and the availability of more data have led to the development of algorithms for analyzing single cells in gene expression data. This study proposes an artificial intelligence architecture that classifies cell types in human tissue. Combining a deep learning model based on convolutional neural network (CNN) with a wide model, the architecture integrates the concept of functional genes neighborhood from Gene Ontology into the CNN model and incorporates information on biologically relevant marker genes for each cell type in the underlying human tissue. The proposed architecture was tested on seven human tissue datasets and compared with three reference literature algorithms, showing equal or better performance than the other models within each tissue.

ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022 (2022)

Proceedings Paper Computer Science, Information Systems

Deep Metric Learning for Histopathological Image Classification

Salvatore Calderaro, Giosue Lo Bosco, Riccardo Rizzo, Filippo Vella

Summary: This study proposes a histopathological image classification method based on convolutional neural networks. By utilizing metric learning, the network learns a representation that clusters labeled samples based on their characteristics, improving classification performance and supporting labeling decisions.

2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022) (2022)

Article Biochemistry & Molecular Biology

A Graph Neural Network Approach for the Analysis of siRNA-Target Biological Networks

Massimo La Rosa, Antonino Fiannaca, Laura La Paglia, Alfonso Urso

Summary: This study presents a graph neural network (GNN) approach for analyzing RNA interference-messenger RNA interaction networks. The GNN method has the ability to predict the efficacy of siRNA and achieves high accuracy on benchmark datasets.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Proceedings Paper Computer Science, Artificial Intelligence

An Active Learning Approach for Classifying Explosion Quakes

Antonino D'Alessandro, Andrea Di Benedetto, Giosue Lo Bosco, Anna Figlioli

Summary: This work introduces an active learning approach to improve the classification of seismo-volcanic events, particularly explosion quakes, using a random forest classifier. Human intervention is involved to annotate uncertain data, resulting in improved probability distribution of events after intervention.

2022 IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (IEEE EAIS 2022) (2022)

Proceedings Paper Engineering, Electrical & Electronic

Question Answering with BERT: designing a 3D virtual avatar for Cultural Heritage exploration

Mariella Farella, Giuseppe Chiazzese, Giosue Lo Bosco

Summary: This paper proposes the design of an avatar system with question-answering capabilities for immersive navigation of cultural heritage sites. The system utilizes technologies like Virtual Reality, Augmented Reality, and Artificial Intelligence to enhance user experience.

2022 IEEE 21ST MEDITERRANEAN ELECTROTECHNICAL CONFERENCE (IEEE MELECON 2022) (2022)

Article Biochemistry & Molecular Biology

Induction of 2-hydroxycatecholestrogens O-methylation: A missing puzzle piece in diagnostics and treatment of lung cancer

Claudia Musial, Narcyz Knap, Renata Zaucha, Paulina Bastian, Giampaolo Barone, Giosue Lo Bosco, Fabrizio Lo-Celso, Lucyna Konieczna, Mariusz Belka, Tomasz Baczek, Antonella Marino Gammazza, Alicja Kuban-Jankowska, Francesco Cappello, Stephan Nussberger, Magdalena Gorska-Ponikowska

Summary: 2-Methoxyestradiol (2-ME) as an inhibitor for non-small cell lung cancer cells may serve as a potential therapeutic approach, reducing cell viability, promoting protein palmitoylation and oxidative stress, and showing relative safety in healthy human cells compared to other estrogen metabolism intermediates.

REDOX BIOLOGY (2022)

No Data Available