☆ 4.4 Article

KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA

ANNALS OF APPLIED STATISTICS (2018)

Journal

ANNALS OF APPLIED STATISTICS

Volume 12, Issue 1, Pages 540-566

Publisher

INST MATHEMATICAL STATISTICS-IMS

DOI: 10.1214/17-AOAS1102

Keywords

Compositional data; distance-based analysis; kernel methods; microbial community data; penalized regression

Categories

Statistics & Probability

Funding

National Institutes of Health [P01-CA168530, U01-CA162077, R01-GM114029]
National Science Foundation [DMS-1161565, DMS-1561814]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The analysis of human microbiome data is often based on dimensionreduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxon-specific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Mathematical & Computational Biology

Sufficient dimension reduction for compositional data

Diego Tomassi, Liliana Forzani, Sabrina Duarte, Ruth M. Pfeiffer

Summary: Recent efforts in characterizing the human microbiome and its relation to chronic diseases have led to advancements in statistical methods for compositional data. Likelihood-based sufficient dimension reduction methods have been developed to find linear combinations that contain all the information in the compositional data regarding an outcome variable. These methods, incorporating variable selection and penalties, address invariance issues arising from the compositional nature of the data and can be applied to continuous or categorical outcomes.

BIOSTATISTICS (2021)

Add to Collection

Article Biochemical Research Methods

coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies

M. Luz Calle, Meritxell Pujolassos, Antoni Susin

Summary: coda4microbiome is a new algorithm for analyzing microbiome data in both cross-sectional and longitudinal studies. The algorithm uses penalized regression on log-ratio models for variable selection and infers dynamic microbial signatures through penalized regression on the summary of log-ratio trajectories. The package provides visual representations for interpretation of the analysis and identified microbial signatures.

BMC BIOINFORMATICS (2023)

Add to Collection

Article Biochemical Research Methods

Multiscale adaptive differential abundance analysis in microbial compositional data

Shulei Wang

Summary: In this study, a new differential abundance test called the MsRDB test is proposed, which embeds the sequences into a metric space and integrates a multiscale adaptive strategy to identify differentially abundant microbes. Compared with existing methods, the MsRDB test can detect differentially abundant microbes at the finest resolution offered by data and is robust to zero counts, compositional effect, and experimental bias in the microbial compositional dataset.

BIOINFORMATICS (2023)

Add to Collection

Article Geosciences, Multidisciplinary

Units Recovery Methods in Compositional Data Analysis

J. A. Martin-Fernandez, J. J. Egozcue, R. A. Olea, V. Pawlowsky-Glahn

Summary: Compositional data requires statistical analysis on a log-ratio basis, with back-transforming estimates to original units. This paper introduces two methods for recovering original units, demonstrated using geochemical data.

NATURAL RESOURCES RESEARCH (2021)

Add to Collection

Article Computer Science, Theory & Methods

Flexible non-parametric regression models for compositional response data with zeros

Michail Tsagris, Abdulaziz Alenazi, Connie Stewart

Summary: This article presents a non-parametric regression approach for analyzing compositional data, using an extension of k-Nearest Neighbours and kernel regression methods, which can accommodate zero values. Simulation studies and real-life data analyses demonstrate that these non-parametric regression methods can make more accurate predictions for complex relationships between compositional response data and Euclidean predictor variables.

STATISTICS AND COMPUTING (2023)

Add to Collection

Article Microbiology

Compositional Data Analysis of Periodontal Disease Microbial Communities

Laura Sisk-Hackworth, Adrian Ortiz-Velez, Micheal B. Reed, Scott T. Kelley

Summary: Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Next-generation sequencing (NGS) studies have shown that PD biodiversity increases with pocket depth and PD communities are highly host-specific. By applying compositional data analysis (CoDA) methods, new features associated with PD, including genera Schwartzia and Aerococcus, and the cytokine C-reactive protein, have been identified. Network analysis revealed lower connectivity among taxa in deeper periodontal pockets, indicating a more random microbiome.

FRONTIERS IN MICROBIOLOGY (2021)

Add to Collection

Article Biology

High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis

Pixu Shi, Yuchen Zhou, Anru R. Zhang

Summary: This study introduces a simple, interpretable, and efficient method for estimating compositional data regression using a novel high-dimensional log-error-in-variable regression model to address issues with zero read counts and randomness in covariates.

BIOMETRIKA (2022)

Add to Collection

Article Multidisciplinary Sciences

SMOTE-CD: SMOTE for compositional data

Teo Nguyen, Kerrie Mengersen, Damien Sous, Benoit Liquet

Summary: This paper proposes an adaptation of the SMOTE technique called SMOTE for Compositional Data (SMOTE-CD) to address the issue of imbalanced compositional data. SMOTE-CD generates synthetic examples using compositional data operations and improves performance in various regression models. However, the impact of oversampling on performance varies depending on the model and data.

PLOS ONE (2023)

Add to Collection

Article Biochemical Research Methods

Supervised learning and model analysis with compositional data

Shimeng Huang, Elisabeth Ailer, Niki Kilbertus, Niklas Pfister

Summary: In this study, a kernel-based nonparametric regression and classification framework called KernelBiome is proposed for compositional data. It captures complex signals and automatically adapts model complexity. Experimental results on 33 publicly available microbiome datasets demonstrate its superior predictive performance and interpretability compared to state-of-the-art machine learning methods. Additionally, two novel quantities are proposed to interpret contributions of individual components and the connection between kernels and distances aids interpretability.

PLOS COMPUTATIONAL BIOLOGY (2023)

Add to Collection

Article Psychology, Multidisciplinary

Compositional Data Analysis Tutorial

Michael Smithson, Stephen B. Broomell

Summary: This article introduces techniques for dealing with dependency in data where numerical data sum to a constant for individual cases, known as compositional or ipsative data. Despite falling out of fashion, compositional data are common in psychological research and can provide unique insights. Sound methods for analyzing compositional data have been developed since the 1980s, and this article aims to enable researchers to analyze compositional data effectively.

PSYCHOLOGICAL METHODS (2022)

Add to Collection

Article Statistics & Probability

Regression Analysis of Asynchronous Longitudinal Functional and Scalar Data

Ting Li, Tengfei Li, Zhongyi Zhu, Hongtu Zhu

Summary: This study introduces a new statistical approach to effectively handle the asynchronous relationship between functional and scalar variables measured at different time points, by introducing functional coefficients and kernel weighting methods. The results suggest that education level, baseline disease status, and the APOE4 gene are major contributing factors to the significant relationship between fractional anisotropy density curves and cognitive function.

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Transfer learning on stratified data: joint estimation transferred from strata

Yimiao Gao, Yuehan Yang

Summary: This paper proposes a method called JETS that utilizes auxiliary models from different groups to estimate the target model. By constructing a penalized framework that combines penalties for the target model and the differences between auxiliary models and the target model, JETS overcomes the challenge of limited samples in high-dimensional studies and obtains stable and accurate estimates, regardless of noisy information in the auxiliary samples.

PATTERN RECOGNITION (2023)

Add to Collection

Article Mathematics, Interdisciplinary Applications

Compositional Data Analysis

Michael Greenacre

Summary: Compositional data are nonnegative data with a constant-sum constraint, with logratios as the fundamental transformation. Combining components can alleviate the issue of zero values. Various statistical analysis can be performed after transforming the data into logratios.

ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 8, 2021 (2021)

Add to Collection

Article Multidisciplinary Sciences

Improving GWAS discovery and genomic prediction accuracy in biobank data

Etienne J. Orliac, Daniel Trejo Banos, Sven E. Ojavee, Kristi Lall, Reedik Magi, Peter M. Visscher, Matthew R. Robinson

Summary: Genetically informed, deep-phenotyped biobanks are an important research resource, and the recently developed Bayesian grouped mixture of regressions model (GMRM) has been shown to achieve the highest genomic prediction accuracy to date. Comparing to other approaches, GMRM outperforms annotation prediction models by 15-18% and improves the discovery of independent loci by 62-65%. The study emphasizes the importance of incorporating MAF and LD information in genetic associations for both genomic prediction and discovery in large-scale individual-level studies.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2022)

Add to Collection

Article Multidisciplinary Sciences

Improving GWAS discovery and genomic prediction accuracy in biobank data

Etienne J. Orliac, Daniel Trejo Banos, Sven E. Ojavee, Kristi Lall, Reedik Magi, Peter M. Visscher, Matthew R. Robinson

Summary: The use of the Bayesian grouped mixture of regressions model (GMRM) in biobanks has shown high genomic prediction accuracy and increased detection of independent loci for genetic association discovery. Considering differences in SNP markers and incorporating prior knowledge of genomic function is crucial for genomic prediction and discovery in large-scale individual-level studies.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2022)

Add to Collection

Article Biology

Identifiability and estimation of structural vector autoregressive models for subsampled and mixed-frequency time series

A. Tank, E. B. Fox, A. Shojaie

BIOMETRIKA (2019)

Add to Collection

Article Nutrition & Dietetics

Plasma metabolomics profiles suggest beneficial effects of a low-glycemic load dietary pattern on inflammation and energy metabolism

Sandi L. Navarro, Aliasghar Tarkhan, Ali Shojaie, Timothy W. Randolph, Haiwei Gu, Danijel Djukovic, Katie J. Osterbauer, Meredith A. Hullar, Mario Kratz, Marian L. Neuhouser, Paul D. Lampe, Daniel Raftery, Johanna W. Lampe

AMERICAN JOURNAL OF CLINICAL NUTRITION (2019)

Add to Collection

Article Nutrition & Dietetics

Associations of plasma trimethylamine N-oxide, choline, carnitine, and betaine with inflammatory and cardiometabolic risk biomarkers and the fecal microbiome in the Multiethnic Cohort Adiposity Phenotype Study

Benjamin C. Fu, Meredith A. J. Hullar, Timothy W. Randolph, Adrian A. Franke, Kristine R. Monroe, Iona Cheng, Lynne R. Wilkens, John A. Shepherd, Margaret M. Madeleine, Loic Le Marchand, Unhee Lim, Johanna W. Lampe

AMERICAN JOURNAL OF CLINICAL NUTRITION (2020)

Add to Collection

Article Biochemical Research Methods

netgsa: Fast computation and interactive visualization for topology-based pathway enrichment analysis

Michael Hellstern, Jing Ma, Kun Yue, Ali Shojaie

Summary: This study focused on improving the existing topology-based pathway enrichment method, NetGSA, through three key enhancements: reducing computation time, integrating pathway databases, and providing interactive visualization. The improved NetGSA outperforms in efficiency and statistical power compared to previous versions and other similar methods.

PLOS COMPUTATIONAL BIOLOGY (2021)

Add to Collection

Article Nutrition & Dietetics

Personalized Nutrition Using Microbial Metabolite Phenotype to Stratify Participants and Non-Invasive Host Exfoliomics Reveal the Effects of Flaxseed Lignan Supplementation in a Placebo-Controlled Crossover Trial

Destiny A. Mullens, Ivan Ivanov, Meredith A. J. Hullar, Timothy W. Randolph, Johanna W. Lampe, Robert S. Chapkin

Summary: This study investigated the impact of the metabolic phenotype of flaxseed lignan on host gene expression. The findings suggest that a higher conversion of flaxseed lignan to enterolactone (ENL) is associated with a suppressed inflammatory status.

NUTRIENTS (2022)

Add to Collection

Article Medicine, Research & Experimental

Lipidomics of cyclophosphamide 4-hydroxylation in patients receiving post-transplant cyclophosphamide

Sandi L. Navarro, Zihan Zheng, Timothy W. Randolph, Ryotaro Nakamura, Brenda M. Sandmaier, David Hockenbery, Jeannine S. McCune

Summary: Biomarker-guided dosing could potentially improve the effectiveness and safety of cyclophosphamide (CY), but evaluating its association with CY plasma concentration-time curve (AUC) is time-consuming. This study aimed to identify lipidomic biomarkers associated with 4-hydroxycyclophosphamide (4HCY) formation clearance in hematopoietic cell transplant patients receiving CY. The results showed the feasibility of lipidomics but further studies are needed to optimize CY dosing in larger samples.

CTS-CLINICAL AND TRANSLATIONAL SCIENCE (2022)

Add to Collection

Article Pharmacology & Pharmacy

Prediction of Busulfan Clearance by Predose Plasma Metabolomic Profiling

Jeannine S. McCune, Sandi L. Navarro, K. Scott Baker, Linda J. Risler, Brian R. Phillips, Timothy W. Randolph, Laura Shireman, H. Gary Schoch, H. Joachim Deeg, Yuzheng Zhang, Alex Men, Loes Maton, Alwin D. R. Huitema

Summary: A linear regression model of 13 endogenous metabolomic compounds (EMCs) can be used to predict an individual's busulfan clearance (BuCL) before administration. This pharmacometabolomics method is more effective than using a busulfan test dose or pharmacogenomics to guide dosing.

CLINICAL PHARMACOLOGY & THERAPEUTICS (2023)

Add to Collection

Article Cell Biology

Posttranslational modifications induce autoantibodies with risk prediction capability in patients with small cell lung cancer

Kristin J. Lastwika, Andrew Kunihiro, Joell L. Solan, Yuzheng Zhang, Lydia R. Taverne, David Shelley, Jung-Hyun Rho, Timothy W. Randolph, Christopher I. Li, Eric L. Grogan, Pierre P. Massion, Annette L. Fitzpatrick, David MacPherson, A. McGarry Houghton, Paul D. Lampe

Summary: Small cell lung cancer (SCLC) triggers the generation of autoantibodies, causing unique paraneoplastic neurological syndromes. We developed a technique to detect autoantibodies directly from patient plasma and found that SCLC patients have significantly higher disease-specific autoantibody signals compared to patients with other cancers. We identified previously unknown autoantibodies produced in response to both intracellular and extracellular tumor antigens in multiple SCLC cohorts and discovered disease-specific posttranslational modifications within targeted extracellular proteins. These findings have implications for the early detection and clinical utility of SCLC.

SCIENCE TRANSLATIONAL MEDICINE (2023)

Add to Collection

Article Medicine, Research & Experimental

Annexin A2/TLR2/MYD88 pathway induces arginase 1 expression in tumor-associated neutrophils

Huajia Zhang, Xiaodong Zhu, Travis J. Friesen, Jeff W. Kwak, Tatyana Pisarenko, Surapat Mekvanich, Mark A. Velasco, Timothy W. Randolph, Julia Kargl, A. McGarry Houghton

Summary: This study reveals the expression of ARG1 in neutrophil lineage cells in non-small cell lung cancer and the active transcription of ARG7 mRNA in tumor-associated neutrophils (TANs). ANXA2 is identified as the major driver of ARG7 mRNA expression in TANs through signaling via the TLR2/MYD88 axis. This study uncovers a novel mechanism in regulating ARG7 mRNA expression in neutrophils and emphasizes the crucial role of neutrophil lineage cells in suppressing tumor-infiltrating lymphocytes.

JOURNAL OF CLINICAL INVESTIGATION (2022)

Add to Collection

Article Gastroenterology & Hepatology

Associations of the gut microbiome with hepatic adiposity in the Multiethnic Cohort Adiposity Phenotype Study

Meredith A. J. Hullar, Isaac C. Jenkins, Timothy W. Randolph, Keith R. Curtis, Kristine R. Monroe, Thomas Ernst, John A. Shepherd, Daniel O. Stram, Iona Cheng, Bruce S. Kristal, Lynne R. Wilkens, Adrian Franke, Loic Le Marchand, Unhee Lim, Johanna W. Lampe

Summary: This study investigated the association of gut microbiome with hepatic adiposity among different ethnicities. The research found that NAFLD patients from various ethnic groups exhibited differences in bacterial composition and metabolism, but shared similar bacterial metabolic pathways.

GUT MICROBES (2021)

Add to Collection

Article Endocrinology & Metabolism

Urinary enterolactone is associated with plasma proteins related to immunity and cancer development in healthy participants on controlled diets

Fayth L. Miles, Sandi L. Navarro, Carly B. Garrison, Timothy W. Randolph, Yuzheng Zhang, Ali Shojaie, Mario Kratz, Meredith A. J. Hullar, Daniel Raftery, Marian L. Neuhouser, Paul D. Lampe, Johanna W. Lampe

Summary: Urinary excretion of the microbial metabolite ENL of dietary lignans is associated with plasma protein abundance, potentially linking to cancer prevention. Over-representation analysis indicates associations of ENL excretion with estrogen and TNF signaling pathways.

HUMAN NUTRITION & METABOLISM (2021)

Add to Collection

Review Statistics & Probability

Differential network analysis: A statistical perspective

Ali Shojaie

Summary: Network analysis is crucial in various scientific disciplines, especially in biology and medicine where it can predict complex diseases and provide insights into disease mechanisms. Recent statistical machine learning methods have been developed for inferring networks and identifying changes in their structures.

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS (2021)

Add to Collection

Article Nutrition & Dietetics

Impact of the Analytical Approach on the Reliability of MRI-Based Assessment of Hepatic Fat Content

Maggie S. Burhans, Niranjan Balu, Kelsey A. Schmidt, Gail Cromer, Kristina M. Utzschneider, Ellen A. Schur, Sarah E. Holte, Timothy W. Randolph, Mario Kratz

CURRENT DEVELOPMENTS IN NUTRITION (2020)

Add to Collection

Article Automation & Control Systems

The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks

Arjun Sondhi, Ali Shojaie

JOURNAL OF MACHINE LEARNING RESEARCH (2019)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.