4.5 Article

Towards large-scale FAME-based bacterial species identification using machine learning techniques

Journal

SYSTEMATIC AND APPLIED MICROBIOLOGY
Volume 32, Issue 3, Pages 163-176

Publisher

ELSEVIER GMBH
DOI: 10.1016/j.syapm.2009.01.003

Keywords

Bacillus; Bacteria; Fatty acid methyl ester; Gas chromatography; Identification; Machine learning; Paenibacillus; Pseudomonas; Species; Taxonomy

Funding

  1. Belgian Science Policy [C3/00/12, IAP VI-PAI VI/06]

Ask authors/readers for more resources

In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models Outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy. (C) 2009 Elsevier GmbH. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Review Food Science & Technology

Food biodiversity: Quantifying the unquantifiable in human diets

Giles T. Hanley-Cook, Aisling J. Daly, Roseline Remans, Andrew D. Jones, Kris A. Murray, Inge Huybrechts, Bernard De Baets, Carl Lachat

Summary: Dietary diversity is an important public health principle, and its measurement is crucial for assessing diet quality and food security. However, the conventional methods fail to capture the full range of food diversity, thus requiring further improvement and adaptation.

CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION (2023)

Article Remote Sensing

Application of the Point-Descriptor-Precedence representation for micro-scale traffic analysis at a non-signalized T-junction

Amna Qayyum, Bernard De Baets, Laure De Cock, Frank Witlox, Guy De Tre, Nico Van de Weghe

Summary: This paper explores the micro-scale traffic interactions at intersections and presents a novel approach to detect and represent the micro-scale traffic movement interactions at a non-signalized T-junction. The study shows that this approach allows for more detailed tracking of vehicle movements and plays an important role in traffic safety assessment.

GEO-SPATIAL INFORMATION SCIENCE (2023)

Article Statistics & Probability

A Nearest Neighbor Open-Set Classifier based on Excesses of Distance Ratios

Matthys Lucas Steyn, Tertius de Wet, Bernard De Baets, Stijn Luca

Summary: This article proposes an open-set recognition model based on extreme value statistics, which introduces a distance ratio to express the dissimilarity between a target point and known classes, and uses the class of generalized Pareto distributions to model the peaks of the distance ratio, providing a probabilistic framework for open-set recognition.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2023)

Article Education & Educational Research

Pass/Fail Prediction in Programming Courses

Charlotte Van Petegem, Louise Deconinck, Dieter Mourisse, Rien Maertens, Niko Strijbol, Bart Dhoedt, Bram De Wever, Peter Dawyndt, Bart Mesuere

Summary: We present a privacy-friendly early detection framework that can identify students at risk of failing in introductory programming courses at university. The framework has been validated in different course settings and shows high accuracy and robustness. It also provides insight into the impact of programming skills on learning and can predict students' future success early in the semester.

JOURNAL OF EDUCATIONAL COMPUTING RESEARCH (2023)

Article Computer Science, Interdisciplinary Applications

Combining natural language processing and multidimensional classifiers to predict and correct CMMS metadata

Arne Deloose, Glenn Gysels, Bernard De Baets, Jan Verwaeren

Summary: This paper explores the use of natural language processing techniques to predict structured metadata in failure notifications. It highlights the challenges posed by the technical nature of the texts and the use of sentence fragments and abbreviations. The authors demonstrate that considering the dependencies between different components of the metadata and treating the prediction problem as a multidimensional classification problem can improve label prediction reliability.

COMPUTERS IN INDUSTRY (2023)

Article Computer Science, Theory & Methods

Fuzzy structures induced by fuzzy betweenness relations

Yi Shi, Bin Pang, Bernard De Baets

Summary: In this article, the relationships between the fuzzy betweenness relations and three important mathematical notions, such as fuzzy interval operators, fuzzy partial orders, and fuzzy Peano-Pasch spaces, are explored in the setting of complete residuated lattices. The resulting category of fuzzy betweenness relations with respect to a fuzzy equivalence relation is shown to be isomorphic to that of geometric fuzzy interval spaces. A fuzzy partial order is constructed from a fuzzy betweenness relation, and its relationships are analyzed in depth. Furthermore, the concept of a fuzzy betweenness field is introduced, and it is shown that a vector space over a fuzzy betweenness field can yield a fuzzy Peano-Pasch space in the setting of completely distributive lattices.

FUZZY SETS AND SYSTEMS (2023)

Article Computer Science, Theory & Methods

Triangular norms on bounded trellises

Lemnaouar Zedam, Bernard De Baets

Summary: In this paper, the concept of a t-norm on bounded pseudo-ordered sets and bounded trellises is introduced, along with some basic examples. The impact of abandoning transitivity is discussed, highlighting that the meet operation is not a t-norm on a proper bounded trellis, and there may be no or multiple maximal t-norms. A generic construction method is provided to extend a t-norm on an interior range of a given perpendicular to-semi-trellis to the entire trellis, with a specific instantiation based on a finite sub-trellis of right-transitive elements. The focus is also on bounded pseudo-chains and modular trellises. (c) 2023 Elsevier B.V. All rights reserved.

FUZZY SETS AND SYSTEMS (2023)

Article Water Resources

Future multivariate weather generation by combining Bartlett-Lewis and vine copula models

Jorn van de Velde, Matthias Demuzere, Bernard De Baets, Niko Verhoest

Summary: This study presents a weather generator combining Bartlett-Lewis models and vine copulas, which can generate time series with statistics similar to those of the input. The generator shows adequate performance in terms of statistical moments and correlation, making it useful for characterizing future extremes.

HYDROLOGICAL SCIENCES JOURNAL (2023)

Article Computer Science, Artificial Intelligence

Prediction of pipe failures in water supply networks for longer time periods through multi-label classification

Alicia Robles-Velasco, Pablo Cortes, Jesus Munuzuri, Bernard De Baets

Summary: This study proposes the use of multi-label classification techniques to predict pipe failures in water supply systems for multiple years. Various models and prediction time periods are analyzed, showing successful results in avoiding pipe failures over time.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Computer Science, Theory & Methods

A decomposition theorem for number-conserving multi-state cellular automata on triangular grids

Barbara Wolnik, Anna Nenca, Bernard De Baets

Summary: This paper discusses two-dimensional cellular automata on a triangular grid that maintain the sum of all cell states. To investigate such automata, the split-and-perturb decomposition method is applied to triangular grids, which was originally developed for square grids. This results in a new mathematical tool that can enumerate k-ary number-conserving cellular automata on a triangular grid, regardless of the value of k.

THEORETICAL COMPUTER SCIENCE (2023)

Article Infectious Diseases

The Potential of Surveillance Data for Dengue Risk Mapping: An Evaluation of Different Approaches in Cuba

Waldemar Baldoquin Rodriguez, Mayelin Mirabal, Patrick Van der Stuyft, Tania Gomez Padron, Viviana Fonseca, Rosa Maria Castillo, Sonia Monteagudo Diaz, Jan M. Baetens, Bernard De Baets, Maria Eugenia Toledo Romani, Veerle Vanlerberghe

Summary: In order to improve dengue prevention and control efforts, the use of routinely collected data to develop risk maps is recommended. By using data from two municipalities in Cuba, dengue experts identified indicators representative of entomological, epidemiological, and demographic risks to construct risk maps. However, there was low agreement between vulnerability and incidence-based risk maps in areas with a prolonged history of dengue transmission, suggesting that an incidence-based approach may not fully capture the complexity of vulnerability.

TROPICAL MEDICINE AND INFECTIOUS DISEASE (2023)

Article Computer Science, Information Systems

Non-uniform number-conserving elementary cellular automata on the infinite grid: A tale of the unexpected

Barbara Wolnik, Maciej Dziemianczuk, Bernard De Baets

Summary: In this paper, non-uniform elementary cellular automata on the infinite grid in the context of number conservation are studied. The study provides a comprehensive description of these automata. Previous research only focused on finite grids and derived hypotheses based on computer experiments. It is found that when considering number conservation for non-uniform cellular automata, the infinite grid cannot be treated as a limiting case of finite grids.

INFORMATION SCIENCES (2023)

Article Ecology

Environment-dependent population dynamics emerging from dynamic energy budgets and individual-scale movement behaviour

Wissam Barhdadi, Aisling J. Daly, Jan M. Baetens, Bernard De Baets

Summary: Individual biology influences population dynamics dependent on the environment through life history. Recent research has integrated metabolic theory with individual-based models to explore the link between individual physiology and demography. However, current population models do not consider individual behaviors, relying instead on imposed population-level relationships. This study proposes extending the model to include individual-scale behaviors and demonstrates its effectiveness in simulating consumer dynamics in a heterogeneous environment.

OIKOS (2023)

Article Physics, Fluids & Plasmas

Seven-state rotation-symmetric number-conserving cellular automaton that is not isomorphic to any septenary one

Barbara Wolnik, Anna Nenca, Adam Dzedzej, Bernard De Baets

Summary: This paper discusses two-dimensional cellular automata with rotation symmetry and number conservation. It is shown that if the number of states k is smaller than or equal to six, then each rotation-symmetric number-conserving cellular automaton is isomorphic to some k-ary one. However, an example of a seven-state rotation-symmetric number-conserving cellular automaton is provided in this paper, which demonstrates the importance of not only focusing on cellular automata with {0, 1, ..., k-1} as state sets.

PHYSICAL REVIEW E (2023)

Proceedings Paper Computer Science, Information Systems

The Winning Probability Relation of Parametrized Families of Random Vectors

Hans De Meyer, Bernard De Baets

Summary: This passage highlights the importance of calculating pairwise winning probabilities between components of random vectors and the reciprocal relation derived from these probabilities can be used as an alternative to establish a stochastic dominance order.

BUILDING BRIDGES BETWEEN SOFT AND STATISTICAL METHODOLOGIES FOR DATA SCIENCE (2023)

No Data Available