Article
Mathematical & Computational Biology
Diego Tomassi, Liliana Forzani, Sabrina Duarte, Ruth M. Pfeiffer
Summary: Recent efforts in characterizing the human microbiome and its relation to chronic diseases have led to advancements in statistical methods for compositional data. Likelihood-based sufficient dimension reduction methods have been developed to find linear combinations that contain all the information in the compositional data regarding an outcome variable. These methods, incorporating variable selection and penalties, address invariance issues arising from the compositional nature of the data and can be applied to continuous or categorical outcomes.
Article
Biochemical Research Methods
M. Luz Calle, Meritxell Pujolassos, Antoni Susin
Summary: coda4microbiome is a new algorithm for analyzing microbiome data in both cross-sectional and longitudinal studies. The algorithm uses penalized regression on log-ratio models for variable selection and infers dynamic microbial signatures through penalized regression on the summary of log-ratio trajectories. The package provides visual representations for interpretation of the analysis and identified microbial signatures.
BMC BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Shulei Wang
Summary: In this study, a new differential abundance test called the MsRDB test is proposed, which embeds the sequences into a metric space and integrates a multiscale adaptive strategy to identify differentially abundant microbes. Compared with existing methods, the MsRDB test can detect differentially abundant microbes at the finest resolution offered by data and is robust to zero counts, compositional effect, and experimental bias in the microbial compositional dataset.
Article
Geosciences, Multidisciplinary
J. A. Martin-Fernandez, J. J. Egozcue, R. A. Olea, V. Pawlowsky-Glahn
Summary: Compositional data requires statistical analysis on a log-ratio basis, with back-transforming estimates to original units. This paper introduces two methods for recovering original units, demonstrated using geochemical data.
NATURAL RESOURCES RESEARCH
(2021)
Article
Computer Science, Theory & Methods
Michail Tsagris, Abdulaziz Alenazi, Connie Stewart
Summary: This article presents a non-parametric regression approach for analyzing compositional data, using an extension of k-Nearest Neighbours and kernel regression methods, which can accommodate zero values. Simulation studies and real-life data analyses demonstrate that these non-parametric regression methods can make more accurate predictions for complex relationships between compositional response data and Euclidean predictor variables.
STATISTICS AND COMPUTING
(2023)
Article
Microbiology
Laura Sisk-Hackworth, Adrian Ortiz-Velez, Micheal B. Reed, Scott T. Kelley
Summary: Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Next-generation sequencing (NGS) studies have shown that PD biodiversity increases with pocket depth and PD communities are highly host-specific. By applying compositional data analysis (CoDA) methods, new features associated with PD, including genera Schwartzia and Aerococcus, and the cytokine C-reactive protein, have been identified. Network analysis revealed lower connectivity among taxa in deeper periodontal pockets, indicating a more random microbiome.
FRONTIERS IN MICROBIOLOGY
(2021)
Article
Biology
Pixu Shi, Yuchen Zhou, Anru R. Zhang
Summary: This study introduces a simple, interpretable, and efficient method for estimating compositional data regression using a novel high-dimensional log-error-in-variable regression model to address issues with zero read counts and randomness in covariates.
Article
Multidisciplinary Sciences
Teo Nguyen, Kerrie Mengersen, Damien Sous, Benoit Liquet
Summary: This paper proposes an adaptation of the SMOTE technique called SMOTE for Compositional Data (SMOTE-CD) to address the issue of imbalanced compositional data. SMOTE-CD generates synthetic examples using compositional data operations and improves performance in various regression models. However, the impact of oversampling on performance varies depending on the model and data.
Article
Biochemical Research Methods
Shimeng Huang, Elisabeth Ailer, Niki Kilbertus, Niklas Pfister
Summary: In this study, a kernel-based nonparametric regression and classification framework called KernelBiome is proposed for compositional data. It captures complex signals and automatically adapts model complexity. Experimental results on 33 publicly available microbiome datasets demonstrate its superior predictive performance and interpretability compared to state-of-the-art machine learning methods. Additionally, two novel quantities are proposed to interpret contributions of individual components and the connection between kernels and distances aids interpretability.
PLOS COMPUTATIONAL BIOLOGY
(2023)
Article
Psychology, Multidisciplinary
Michael Smithson, Stephen B. Broomell
Summary: This article introduces techniques for dealing with dependency in data where numerical data sum to a constant for individual cases, known as compositional or ipsative data. Despite falling out of fashion, compositional data are common in psychological research and can provide unique insights. Sound methods for analyzing compositional data have been developed since the 1980s, and this article aims to enable researchers to analyze compositional data effectively.
PSYCHOLOGICAL METHODS
(2022)
Article
Statistics & Probability
Ting Li, Tengfei Li, Zhongyi Zhu, Hongtu Zhu
Summary: This study introduces a new statistical approach to effectively handle the asynchronous relationship between functional and scalar variables measured at different time points, by introducing functional coefficients and kernel weighting methods. The results suggest that education level, baseline disease status, and the APOE4 gene are major contributing factors to the significant relationship between fractional anisotropy density curves and cognitive function.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2022)
Article
Computer Science, Artificial Intelligence
Yimiao Gao, Yuehan Yang
Summary: This paper proposes a method called JETS that utilizes auxiliary models from different groups to estimate the target model. By constructing a penalized framework that combines penalties for the target model and the differences between auxiliary models and the target model, JETS overcomes the challenge of limited samples in high-dimensional studies and obtains stable and accurate estimates, regardless of noisy information in the auxiliary samples.
PATTERN RECOGNITION
(2023)
Article
Mathematics, Interdisciplinary Applications
Michael Greenacre
Summary: Compositional data are nonnegative data with a constant-sum constraint, with logratios as the fundamental transformation. Combining components can alleviate the issue of zero values. Various statistical analysis can be performed after transforming the data into logratios.
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 8, 2021
(2021)
Article
Multidisciplinary Sciences
Etienne J. Orliac, Daniel Trejo Banos, Sven E. Ojavee, Kristi Lall, Reedik Magi, Peter M. Visscher, Matthew R. Robinson
Summary: Genetically informed, deep-phenotyped biobanks are an important research resource, and the recently developed Bayesian grouped mixture of regressions model (GMRM) has been shown to achieve the highest genomic prediction accuracy to date. Comparing to other approaches, GMRM outperforms annotation prediction models by 15-18% and improves the discovery of independent loci by 62-65%. The study emphasizes the importance of incorporating MAF and LD information in genetic associations for both genomic prediction and discovery in large-scale individual-level studies.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2022)
Article
Multidisciplinary Sciences
Etienne J. Orliac, Daniel Trejo Banos, Sven E. Ojavee, Kristi Lall, Reedik Magi, Peter M. Visscher, Matthew R. Robinson
Summary: The use of the Bayesian grouped mixture of regressions model (GMRM) in biobanks has shown high genomic prediction accuracy and increased detection of independent loci for genetic association discovery. Considering differences in SNP markers and incorporating prior knowledge of genomic function is crucial for genomic prediction and discovery in large-scale individual-level studies.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2022)
Article
Biology
A. Tank, E. B. Fox, A. Shojaie
Article
Nutrition & Dietetics
Sandi L. Navarro, Aliasghar Tarkhan, Ali Shojaie, Timothy W. Randolph, Haiwei Gu, Danijel Djukovic, Katie J. Osterbauer, Meredith A. Hullar, Mario Kratz, Marian L. Neuhouser, Paul D. Lampe, Daniel Raftery, Johanna W. Lampe
AMERICAN JOURNAL OF CLINICAL NUTRITION
(2019)
Article
Nutrition & Dietetics
Benjamin C. Fu, Meredith A. J. Hullar, Timothy W. Randolph, Adrian A. Franke, Kristine R. Monroe, Iona Cheng, Lynne R. Wilkens, John A. Shepherd, Margaret M. Madeleine, Loic Le Marchand, Unhee Lim, Johanna W. Lampe
AMERICAN JOURNAL OF CLINICAL NUTRITION
(2020)
Article
Biochemical Research Methods
Michael Hellstern, Jing Ma, Kun Yue, Ali Shojaie
Summary: This study focused on improving the existing topology-based pathway enrichment method, NetGSA, through three key enhancements: reducing computation time, integrating pathway databases, and providing interactive visualization. The improved NetGSA outperforms in efficiency and statistical power compared to previous versions and other similar methods.
PLOS COMPUTATIONAL BIOLOGY
(2021)
Article
Nutrition & Dietetics
Destiny A. Mullens, Ivan Ivanov, Meredith A. J. Hullar, Timothy W. Randolph, Johanna W. Lampe, Robert S. Chapkin
Summary: This study investigated the impact of the metabolic phenotype of flaxseed lignan on host gene expression. The findings suggest that a higher conversion of flaxseed lignan to enterolactone (ENL) is associated with a suppressed inflammatory status.
Article
Medicine, Research & Experimental
Sandi L. Navarro, Zihan Zheng, Timothy W. Randolph, Ryotaro Nakamura, Brenda M. Sandmaier, David Hockenbery, Jeannine S. McCune
Summary: Biomarker-guided dosing could potentially improve the effectiveness and safety of cyclophosphamide (CY), but evaluating its association with CY plasma concentration-time curve (AUC) is time-consuming. This study aimed to identify lipidomic biomarkers associated with 4-hydroxycyclophosphamide (4HCY) formation clearance in hematopoietic cell transplant patients receiving CY. The results showed the feasibility of lipidomics but further studies are needed to optimize CY dosing in larger samples.
CTS-CLINICAL AND TRANSLATIONAL SCIENCE
(2022)
Article
Pharmacology & Pharmacy
Jeannine S. McCune, Sandi L. Navarro, K. Scott Baker, Linda J. Risler, Brian R. Phillips, Timothy W. Randolph, Laura Shireman, H. Gary Schoch, H. Joachim Deeg, Yuzheng Zhang, Alex Men, Loes Maton, Alwin D. R. Huitema
Summary: A linear regression model of 13 endogenous metabolomic compounds (EMCs) can be used to predict an individual's busulfan clearance (BuCL) before administration. This pharmacometabolomics method is more effective than using a busulfan test dose or pharmacogenomics to guide dosing.
CLINICAL PHARMACOLOGY & THERAPEUTICS
(2023)
Article
Cell Biology
Kristin J. Lastwika, Andrew Kunihiro, Joell L. Solan, Yuzheng Zhang, Lydia R. Taverne, David Shelley, Jung-Hyun Rho, Timothy W. Randolph, Christopher I. Li, Eric L. Grogan, Pierre P. Massion, Annette L. Fitzpatrick, David MacPherson, A. McGarry Houghton, Paul D. Lampe
Summary: Small cell lung cancer (SCLC) triggers the generation of autoantibodies, causing unique paraneoplastic neurological syndromes. We developed a technique to detect autoantibodies directly from patient plasma and found that SCLC patients have significantly higher disease-specific autoantibody signals compared to patients with other cancers. We identified previously unknown autoantibodies produced in response to both intracellular and extracellular tumor antigens in multiple SCLC cohorts and discovered disease-specific posttranslational modifications within targeted extracellular proteins. These findings have implications for the early detection and clinical utility of SCLC.
SCIENCE TRANSLATIONAL MEDICINE
(2023)
Article
Medicine, Research & Experimental
Huajia Zhang, Xiaodong Zhu, Travis J. Friesen, Jeff W. Kwak, Tatyana Pisarenko, Surapat Mekvanich, Mark A. Velasco, Timothy W. Randolph, Julia Kargl, A. McGarry Houghton
Summary: This study reveals the expression of ARG1 in neutrophil lineage cells in non-small cell lung cancer and the active transcription of ARG7 mRNA in tumor-associated neutrophils (TANs). ANXA2 is identified as the major driver of ARG7 mRNA expression in TANs through signaling via the TLR2/MYD88 axis. This study uncovers a novel mechanism in regulating ARG7 mRNA expression in neutrophils and emphasizes the crucial role of neutrophil lineage cells in suppressing tumor-infiltrating lymphocytes.
JOURNAL OF CLINICAL INVESTIGATION
(2022)
Article
Gastroenterology & Hepatology
Meredith A. J. Hullar, Isaac C. Jenkins, Timothy W. Randolph, Keith R. Curtis, Kristine R. Monroe, Thomas Ernst, John A. Shepherd, Daniel O. Stram, Iona Cheng, Bruce S. Kristal, Lynne R. Wilkens, Adrian Franke, Loic Le Marchand, Unhee Lim, Johanna W. Lampe
Summary: This study investigated the association of gut microbiome with hepatic adiposity among different ethnicities. The research found that NAFLD patients from various ethnic groups exhibited differences in bacterial composition and metabolism, but shared similar bacterial metabolic pathways.
Article
Endocrinology & Metabolism
Fayth L. Miles, Sandi L. Navarro, Carly B. Garrison, Timothy W. Randolph, Yuzheng Zhang, Ali Shojaie, Mario Kratz, Meredith A. J. Hullar, Daniel Raftery, Marian L. Neuhouser, Paul D. Lampe, Johanna W. Lampe
Summary: Urinary excretion of the microbial metabolite ENL of dietary lignans is associated with plasma protein abundance, potentially linking to cancer prevention. Over-representation analysis indicates associations of ENL excretion with estrogen and TNF signaling pathways.
HUMAN NUTRITION & METABOLISM
(2021)
Review
Statistics & Probability
Ali Shojaie
Summary: Network analysis is crucial in various scientific disciplines, especially in biology and medicine where it can predict complex diseases and provide insights into disease mechanisms. Recent statistical machine learning methods have been developed for inferring networks and identifying changes in their structures.
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS
(2021)
Article
Nutrition & Dietetics
Maggie S. Burhans, Niranjan Balu, Kelsey A. Schmidt, Gail Cromer, Kristina M. Utzschneider, Ellen A. Schur, Sarah E. Holte, Timothy W. Randolph, Mario Kratz
CURRENT DEVELOPMENTS IN NUTRITION
(2020)
Article
Automation & Control Systems
Arjun Sondhi, Ali Shojaie
JOURNAL OF MACHINE LEARNING RESEARCH
(2019)