☆ 4.7 Article

Comparison of zero replacement strategies for compositional data with large numbers of zeros

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2021)

期刊

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS

卷 210, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.chemolab.2021.104248

关键词

Imputation; Compositional data analysis; ZeroSum regression; Microbiome data

类别

Automation & Control Systems Chemistry, Analytical Computer Science, Artificial Intelligence Instruments & Instrumentation Mathematics, Interdisciplinary Applications Statistics & Probability

向作者/读者索取更多资源

Protocol

Reagent

智能总结 New
摘要

Modern applications in chemometrics and bioinformatics often involve compositional data sets with a high proportion of zeros, such as microbiome data. When building statistical models, it is crucial to replace zeros with sensible values. Different replacement techniques are compared, including a method based on deep learning, to provide insights into their appropriateness for specific problems and discuss differences in statistical results.

Modern applications in chemometrics and bioinformatics result in compositional data sets with a high proportion of zeros. An example are microbiome data, where zeros refer to measurements below the detection limit of one count. When building statistical models, it is important that zeros are replaced by sensible values. Different replacement techniques from compositional data analysis are considered and compared by a simulation study and examples. The comparison also includes a recently proposed method (Templ, 2020) [1] based on deep learning. Detailed insights into the appropriateness of the methods for a problem at hand are provided, and differences in the outcomes of statistical results are discussed.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Biochemical Research Methods

Supervised learning and model analysis with compositional data

Shimeng Huang, Elisabeth Ailer, Niki Kilbertus, Niklas Pfister

Summary: In this study, a kernel-based nonparametric regression and classification framework called KernelBiome is proposed for compositional data. It captures complex signals and automatically adapts model complexity. Experimental results on 33 publicly available microbiome datasets demonstrate its superior predictive performance and interpretability compared to state-of-the-art machine learning methods. Additionally, two novel quantities are proposed to interpret contributions of individual components and the connection between kernels and distances aids interpretability.

PLOS COMPUTATIONAL BIOLOGY (2023)

添加到收藏夹

Article Biochemical Research Methods

Multiscale adaptive differential abundance analysis in microbial compositional data

Shulei Wang

Summary: In this study, a new differential abundance test called the MsRDB test is proposed, which embeds the sequences into a metric space and integrates a multiscale adaptive strategy to identify differentially abundant microbes. Compared with existing methods, the MsRDB test can detect differentially abundant microbes at the finest resolution offered by data and is robust to zero counts, compositional effect, and experimental bias in the microbial compositional dataset.

BIOINFORMATICS (2023)

添加到收藏夹

Article Microbiology

Compositional Data Analysis of Periodontal Disease Microbial Communities

Laura Sisk-Hackworth, Adrian Ortiz-Velez, Micheal B. Reed, Scott T. Kelley

Summary: Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Next-generation sequencing (NGS) studies have shown that PD biodiversity increases with pocket depth and PD communities are highly host-specific. By applying compositional data analysis (CoDA) methods, new features associated with PD, including genera Schwartzia and Aerococcus, and the cytokine C-reactive protein, have been identified. Network analysis revealed lower connectivity among taxa in deeper periodontal pockets, indicating a more random microbiome.

FRONTIERS IN MICROBIOLOGY (2021)

添加到收藏夹

Article Biochemical Research Methods

coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies

M. Luz Calle, Meritxell Pujolassos, Antoni Susin

Summary: coda4microbiome is a new algorithm for analyzing microbiome data in both cross-sectional and longitudinal studies. The algorithm uses penalized regression on log-ratio models for variable selection and infers dynamic microbial signatures through penalized regression on the summary of log-ratio trajectories. The package provides visual representations for interpretation of the analysis and identified microbial signatures.

BMC BIOINFORMATICS (2023)

添加到收藏夹

Article Biology

High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis

Pixu Shi, Yuchen Zhou, Anru R. Zhang

Summary: This study introduces a simple, interpretable, and efficient method for estimating compositional data regression using a novel high-dimensional log-error-in-variable regression model to address issues with zero read counts and randomness in covariates.

BIOMETRIKA (2022)

添加到收藏夹

Article Multidisciplinary Sciences

SMOTE-CD: SMOTE for compositional data

Teo Nguyen, Kerrie Mengersen, Damien Sous, Benoit Liquet

Summary: This paper proposes an adaptation of the SMOTE technique called SMOTE for Compositional Data (SMOTE-CD) to address the issue of imbalanced compositional data. SMOTE-CD generates synthetic examples using compositional data operations and improves performance in various regression models. However, the impact of oversampling on performance varies depending on the model and data.

PLOS ONE (2023)

添加到收藏夹

Article Psychology, Multidisciplinary

Compositional Data Analysis Tutorial

Michael Smithson, Stephen B. Broomell

Summary: This article introduces techniques for dealing with dependency in data where numerical data sum to a constant for individual cases, known as compositional or ipsative data. Despite falling out of fashion, compositional data are common in psychological research and can provide unique insights. Sound methods for analyzing compositional data have been developed since the 1980s, and this article aims to enable researchers to analyze compositional data effectively.

PSYCHOLOGICAL METHODS (2022)

添加到收藏夹

Article Mathematics, Interdisciplinary Applications

Compositional Data Analysis

Michael Greenacre

Summary: Compositional data are nonnegative data with a constant-sum constraint, with logratios as the fundamental transformation. Combining components can alleviate the issue of zero values. Various statistical analysis can be performed after transforming the data into logratios.

ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 8, 2021 (2021)

添加到收藏夹

Article Computer Science, Information Systems

Analysis of missing data and comparing the accuracy of imputation methods using wheat crop data

Preeti Saini, Bharti Nagpal

Summary: The study focuses on imputing missing data in the Wheat crop yield Dataset to improve crop estimation or production forecasting. Different imputation techniques are explored and evaluated for their performance. The results show that the Arithmetic Average Replacement method performs well among the statistical methods, while Miss Forest and MICE methods perform well among the Machine Learning based methods.

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

添加到收藏夹

Article Biochemical Research Methods

Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data

Asli Boyraz, Vera Pawlowsky-Glahn, Juan Jose Egozcue, Aybar Can Acar

Summary: This study presents a novel approach that groups Operational Taxonomical Units (OTUs) based on relative abundances using principal balances, providing an alternative to taxon grouping. The proposed method has potential applications in dimensionality reduction and construction of microbial balances for disease prediction, offering a coherent data analysis for biomarker discovery in human microbiota.

BRIEFINGS IN BIOINFORMATICS (2022)

添加到收藏夹

Article Biochemical Research Methods

Sparse least trimmed squares regression with compositional covariates for high-dimensional data

Gianna Serafina Monti, Peter Filzmoser

Summary: High-throughput sequencing technologies provide a large amount of data for microbiome composition analysis, which requires consideration of data sparsity and uniqueness. This article proposes a regression variable selection method that takes into account the special nature of microbiome data, achieving sparsity and robustness in regression coefficient estimates through elastic-net regularization. The practical utility of the method is demonstrated through real-world application and simulation studies.

BIOINFORMATICS (2021)

添加到收藏夹

Article Biochemical Research Methods

phyLoSTM: a novel deep learning model on disease prediction from longitudinal microbiome data

Divya Sharma, Wei Xu

Summary: This study introduces a novel deep learning framework 'phyLoSTM' for analyzing temporal dependency in longitudinal microbiome sequencing data and predicting diseases in relation to host's environmental factors. Results show promising performance in simulated and real microbiome studies.

BIOINFORMATICS (2021)

添加到收藏夹

Article Biochemistry & Molecular Biology

Pitfalls in the statistical analysis of microbiome amplicon sequencing data

Hendriek C. Boshuizen, Dennis E. te Beest

Summary: This paper lists 14 statistical methods or approaches that should be generally avoided for microbiome data analysis, either because the assumptions behind them are unlikely to be met or because they are used inappropriately. Researchers should conduct more critical evaluations and choose appropriate methods for microbiome data analysis.

MOLECULAR ECOLOGY RESOURCES (2023)

添加到收藏夹

Article Biochemical Research Methods

C3NA: correlation and consensus-based cross-taxonomy network analysis for compositional microbial data

Kuncheng Song, Yi-Hui Zhou

Summary: This article introduces a user-friendly R package named Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA) for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA was used to analyze two well-studied diseases, colorectal cancer, and Crohn's disease, and clusters of study and disease-dependent taxa were discovered, overlapping with known functional taxa studied by other discovery studies and differential abundance analyses.

BMC BIOINFORMATICS (2022)

添加到收藏夹

Article Geosciences, Multidisciplinary

Classical and Robust Regression Analysis with Compositional Data

K. G. van den Boogaart, P. Filzmoser, K. Hron, M. Templ, R. Tolosana-Delgado

Summary: Compositional data contain valuable information within the relationships between the compositional parts, which can be utilized for regression modeling. Balance coordinates are constructed to interpret regression coefficients and test hypotheses of subcompositional independence. Both classical least-squares regression and robust MM regression were compared within different regression models using a real data set from a geochemical mapping project.

MATHEMATICAL GEOSCIENCES (2021)

添加到收藏夹

暂无数据

Article Automation & Control Systems

Multi-modal hybrid modeling strategy based on Gaussian Mixture Variational Autoencoder and spatial-temporal attention: Application to industrial process prediction

Haifei Peng, Jian Long, Cheng Huang, Shibo Wei, Zhencheng Ye

Summary: This paper proposes a novel multi-modal hybrid modeling strategy (GMVAE-STA) that can effectively extract deep multi-modal representations and complex spatial and temporal relationships, and applies it to industrial process prediction.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2024)

添加到收藏夹

© Peeref 2019-2024. All rights reserved.