Article
Biochemical Research Methods
Shimeng Huang, Elisabeth Ailer, Niki Kilbertus, Niklas Pfister
Summary: In this study, a kernel-based nonparametric regression and classification framework called KernelBiome is proposed for compositional data. It captures complex signals and automatically adapts model complexity. Experimental results on 33 publicly available microbiome datasets demonstrate its superior predictive performance and interpretability compared to state-of-the-art machine learning methods. Additionally, two novel quantities are proposed to interpret contributions of individual components and the connection between kernels and distances aids interpretability.
PLOS COMPUTATIONAL BIOLOGY
(2023)
Article
Biochemical Research Methods
Shulei Wang
Summary: In this study, a new differential abundance test called the MsRDB test is proposed, which embeds the sequences into a metric space and integrates a multiscale adaptive strategy to identify differentially abundant microbes. Compared with existing methods, the MsRDB test can detect differentially abundant microbes at the finest resolution offered by data and is robust to zero counts, compositional effect, and experimental bias in the microbial compositional dataset.
Article
Microbiology
Laura Sisk-Hackworth, Adrian Ortiz-Velez, Micheal B. Reed, Scott T. Kelley
Summary: Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Next-generation sequencing (NGS) studies have shown that PD biodiversity increases with pocket depth and PD communities are highly host-specific. By applying compositional data analysis (CoDA) methods, new features associated with PD, including genera Schwartzia and Aerococcus, and the cytokine C-reactive protein, have been identified. Network analysis revealed lower connectivity among taxa in deeper periodontal pockets, indicating a more random microbiome.
FRONTIERS IN MICROBIOLOGY
(2021)
Article
Biochemical Research Methods
M. Luz Calle, Meritxell Pujolassos, Antoni Susin
Summary: coda4microbiome is a new algorithm for analyzing microbiome data in both cross-sectional and longitudinal studies. The algorithm uses penalized regression on log-ratio models for variable selection and infers dynamic microbial signatures through penalized regression on the summary of log-ratio trajectories. The package provides visual representations for interpretation of the analysis and identified microbial signatures.
BMC BIOINFORMATICS
(2023)
Article
Biology
Pixu Shi, Yuchen Zhou, Anru R. Zhang
Summary: This study introduces a simple, interpretable, and efficient method for estimating compositional data regression using a novel high-dimensional log-error-in-variable regression model to address issues with zero read counts and randomness in covariates.
Article
Multidisciplinary Sciences
Teo Nguyen, Kerrie Mengersen, Damien Sous, Benoit Liquet
Summary: This paper proposes an adaptation of the SMOTE technique called SMOTE for Compositional Data (SMOTE-CD) to address the issue of imbalanced compositional data. SMOTE-CD generates synthetic examples using compositional data operations and improves performance in various regression models. However, the impact of oversampling on performance varies depending on the model and data.
Article
Psychology, Multidisciplinary
Michael Smithson, Stephen B. Broomell
Summary: This article introduces techniques for dealing with dependency in data where numerical data sum to a constant for individual cases, known as compositional or ipsative data. Despite falling out of fashion, compositional data are common in psychological research and can provide unique insights. Sound methods for analyzing compositional data have been developed since the 1980s, and this article aims to enable researchers to analyze compositional data effectively.
PSYCHOLOGICAL METHODS
(2022)
Article
Mathematics, Interdisciplinary Applications
Michael Greenacre
Summary: Compositional data are nonnegative data with a constant-sum constraint, with logratios as the fundamental transformation. Combining components can alleviate the issue of zero values. Various statistical analysis can be performed after transforming the data into logratios.
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 8, 2021
(2021)
Article
Computer Science, Information Systems
Preeti Saini, Bharti Nagpal
Summary: The study focuses on imputing missing data in the Wheat crop yield Dataset to improve crop estimation or production forecasting. Different imputation techniques are explored and evaluated for their performance. The results show that the Arithmetic Average Replacement method performs well among the statistical methods, while Miss Forest and MICE methods perform well among the Machine Learning based methods.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Biochemical Research Methods
Asli Boyraz, Vera Pawlowsky-Glahn, Juan Jose Egozcue, Aybar Can Acar
Summary: This study presents a novel approach that groups Operational Taxonomical Units (OTUs) based on relative abundances using principal balances, providing an alternative to taxon grouping. The proposed method has potential applications in dimensionality reduction and construction of microbial balances for disease prediction, offering a coherent data analysis for biomarker discovery in human microbiota.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Biochemical Research Methods
Gianna Serafina Monti, Peter Filzmoser
Summary: High-throughput sequencing technologies provide a large amount of data for microbiome composition analysis, which requires consideration of data sparsity and uniqueness. This article proposes a regression variable selection method that takes into account the special nature of microbiome data, achieving sparsity and robustness in regression coefficient estimates through elastic-net regularization. The practical utility of the method is demonstrated through real-world application and simulation studies.
Article
Biochemical Research Methods
Divya Sharma, Wei Xu
Summary: This study introduces a novel deep learning framework 'phyLoSTM' for analyzing temporal dependency in longitudinal microbiome sequencing data and predicting diseases in relation to host's environmental factors. Results show promising performance in simulated and real microbiome studies.
Article
Biochemistry & Molecular Biology
Hendriek C. Boshuizen, Dennis E. te Beest
Summary: This paper lists 14 statistical methods or approaches that should be generally avoided for microbiome data analysis, either because the assumptions behind them are unlikely to be met or because they are used inappropriately. Researchers should conduct more critical evaluations and choose appropriate methods for microbiome data analysis.
MOLECULAR ECOLOGY RESOURCES
(2023)
Article
Biochemical Research Methods
Kuncheng Song, Yi-Hui Zhou
Summary: This article introduces a user-friendly R package named Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA) for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA was used to analyze two well-studied diseases, colorectal cancer, and Crohn's disease, and clusters of study and disease-dependent taxa were discovered, overlapping with known functional taxa studied by other discovery studies and differential abundance analyses.
BMC BIOINFORMATICS
(2022)
Article
Geosciences, Multidisciplinary
K. G. van den Boogaart, P. Filzmoser, K. Hron, M. Templ, R. Tolosana-Delgado
Summary: Compositional data contain valuable information within the relationships between the compositional parts, which can be utilized for regression modeling. Balance coordinates are constructed to interpret regression coefficients and test hypotheses of subcompositional independence. Both classical least-squares regression and robust MM regression were compared within different regression models using a real data set from a geochemical mapping project.
MATHEMATICAL GEOSCIENCES
(2021)
Article
Automation & Control Systems
Haifei Peng, Jian Long, Cheng Huang, Shibo Wei, Zhencheng Ye
Summary: This paper proposes a novel multi-modal hybrid modeling strategy (GMVAE-STA) that can effectively extract deep multi-modal representations and complex spatial and temporal relationships, and applies it to industrial process prediction.
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
(2024)