☆ 4.4 Article

CONTRASTIVE LATENT VARIABLE MODELING WITH APPLICATION TO CASE-CONTROL SEQUENCING EXPERIMENTS

ANNALS OF APPLIED STATISTICS (2022)

期刊

ANNALS OF APPLIED STATISTICS

卷 16, 期 3, 页码 1268-1291

出版社

INST MATHEMATICAL STATISTICS-IMS

DOI: 10.1214/21-AOAS1534

关键词

Latent variable models; RNA sequencing; differential expression; contrastive models; case-control data

类别

Statistics & Probability

资金

Helmsley Trust
NIH NHLBI [R01 HL133218]
NSF CAREER [AWD1005627]
NIH Human Tumor Atlas Research Program

向作者/读者索取更多资源

Protocol

Reagent

智能总结 New
摘要

High-throughput RNA-sequencing technologies are powerful for understanding cellular state, but common methods ignore changes in transcriptional correlation. The proposed contrastive latent variable models aim to create a richer portrait of differential expression and identify the low-dimensional structure of gene expression shift.

High-throughput RNA-sequencing (RNA-seq) technologies are powerful tools for understanding cellular state. Often, it is of interest to quantify and to summarize changes in cell state that occur between experimental or biological conditions. Differential expression is typically assessed using univariate tests to measure genewise shifts in expression. However, these methods largely ignore changes in transcriptional correlation. Furthermore, there is a need to identify the low-dimensional structure of the gene expression shift to identify collections of genes that change between conditions. Here, we propose contrastive latent variable models designed for count data to create a richer portrait of differential expression in sequencing data. These models disentangle the sources of transcriptional variation in different conditions in the context of an explicit model of variation at baseline. Moreover, we develop a model-based hypothesis testing framework that can test for global and gene subset-specific changes in expression. We evaluate our model through extensive simulations and analyses with count-based gene expression data from perturbation and observational sequencing experiments. We find that our methods effectively summarize and quantify complex transcriptional changes in case-control experimental sequencing data.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Biochemical Research Methods

Benchmarking methods for detecting differential states between conditions from multi-subject single-cell RNA-seq data

Sini Junttila, Johannes Smolander, Laura L. Elo

Summary: This study compared 18 methods for identifying differential states (DS) changes between conditions in multisubject scRNA-seq data, and found that pseudobulk methods and mixed models performed best, showing superior statistical performance compared to naive single-cell methods.

BRIEFINGS IN BIOINFORMATICS (2022)

添加到收藏夹

Article Mathematical & Computational Biology

Differential expression of single-cell RNA-seq data using Tweedie models

Himel Mallick, Suvo Chatterjee, Shrabanti Chowdhury, Saptarshi Chatterjee, Ali Rahnavard, Stephanie C. Hicks

Summary: The performance of computational methods and software for identifying differentially expressed features in single-cell RNA sequencing is affected by normalization methods and the choice of experimental platform. The study introduces a Tweedie generalized linear model to model the technological variability in cross-platform scRNA-seq data, resulting in improved statistical power and false discovery rate control.

STATISTICS IN MEDICINE (2022)

添加到收藏夹

Article Biochemical Research Methods

Inferring gene regulatory networks from single-cell gene expression data via deep multi-view contrastive learning

Zerun Lin, Le Ou-Yang

Summary: In this paper, a multi-view contrastive learning model (DeepMCL) is proposed to infer gene regulatory networks (GRNs) from single-cell RNA-seq (scRNA-seq) data collected from multiple data sources or time points. The experimental results validate the effectiveness of our contrastive learning and attention mechanisms, demonstrating the effectiveness of our model in integrating multiple data sources for GRN inference.

BRIEFINGS IN BIOINFORMATICS (2023)

添加到收藏夹

Article Biochemical Research Methods

Inferring gene regulatory networks from single-cell gene expression data via deep multi-view contrastive learning

Zerun Lin, Le Ou-Yang

Summary: The paper proposes a multi-view contrastive learning model, DeepMCL, for inferring gene regulatory networks from scRNA-seq data. By utilizing a deep Siamese convolutional neural network and attention mechanism, the model integrates information from multiple data sources, improving the accuracy of network inference.

BRIEFINGS IN BIOINFORMATICS (2022)

添加到收藏夹

Article Biotechnology & Applied Microbiology

Differential abundance testing on single-cell data using k-nearest neighbor graphs

Emma Dann, Neil C. Henderson, Sarah A. Teichmann, Michael D. Morgan, John C. Marioni

Summary: Milo is a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. It can identify perturbations obscured by discretizing cells into clusters and outperforms alternative testing strategies. Milo is based on cell-cell similarity structure and may be applicable to various single-cell data beyond scRNA-seq.

NATURE BIOTECHNOLOGY (2022)

添加到收藏夹

Article Health Care Sciences & Services

A tail-based test to detect differential expression in RNA-sequencing data

Jiong Chen, Xinlei Mi, Jing Ning, Xuming He, Jianhua Hu

Summary: RNA sequencing data are widely used in biomedical research for biomarker discovery. A tail-based test, derived from quantile regression, has been proposed to compare groups in terms of a specific distribution area instead of a single location. Monte Carlo simulation studies show that this test is generally more powerful and robust in detecting differential expression compared to commonly used tests based on the mean or a single quantile.

STATISTICAL METHODS IN MEDICAL RESEARCH (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Transcriptome profiling of kisspeptin neurons from the mouse arcuate nucleus reveals new mechanisms in estrogenic control of fertility

Balazs Gocz, Eva Rumpler, Miklos Sarvari, Katalin Skrapits, Szabolcs Takacs, Imre Farkas, Veronika Csillag, Sarolta H. Trinh, Zsuzsanna Bardoczi, Yvette Ruska, Norbert Solymosi, Szilard Poliska, Zsuzsanna Szoke, Lucia Bartoloni, Yassine Zouaghi, Andrea Messina, Nelly Pitteloud, Ross C. Anderson, Robert P. Millar, Richard Quinton, Stephen M. Manchishi, William H. Colledge, Erik Hrabovszky

Summary: This study provides a comprehensive characterization of the estrogen-dependent kisspeptin neuron transcriptome and sheds light on the molecular mechanisms of ovary-brain communication, which is important for genetic research on human fertility disorders.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2022)

添加到收藏夹

Article Biology

Estimating differential latent variable graphical models with applications to brain connectivity

S. Na, M. Kolar, O. Koyejo

Summary: Differential graphical models seek to represent differences in conditional dependence structures between two groups, with an extended setting considered in this manuscript involving latent variable Gaussian graphical models. The proposed two-stage estimation method decomposes the differential network into sparse and low-rank components, demonstrating superior performance in experiments compared to existing methods.

BIOMETRIKA (2021)

添加到收藏夹

Article Genetics & Heredity

Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis

Tomasz Kujawa, Michal Marczyk, Joanna Polanska

Summary: Large-scale comprehensive single-cell experiments often lead to batch effects, which need to be corrected during data integration. Data integration is challenging due to the overlapping of biological and technical factors. The choice of integration method is crucial for downstream analysis.

FRONTIERS IN GENETICS (2022)

添加到收藏夹

Article Biochemical Research Methods

Latent variable mixture models to address heterogeneity in patient-reported outcome data

Lisa M. Lix, Olawale Ayilara

Summary: This article aims to review the characteristics and applications of latent variable mixture models (LVMMs) in patient-reported outcome (PRO) data, and provide a demonstration of their use. LVMMs can be used to identify homogeneous sub-groups within a study population based on observed patterns of responses in PRO data. The article focuses on mixture item response theory (IRT) models, which combine latent class analysis with the conventional IRT model. An illustrative example is presented using clinical registry data.

METHODS (2022)

添加到收藏夹

Review Psychology, Multidisciplinary

Recent Integrations of Latent Variable Network Modeling With Psychometric Models

Selena Wang

Summary: This paper introduces latent variable network models and their integration with psychometric models, summarizing developments under network psychometrics and distinguishing graphical models under this framework from other network models. Each model is introduced using unified notations, with all methods accompanied by available R packages for further independent learning.

FRONTIERS IN PSYCHOLOGY (2021)

添加到收藏夹

Article Biochemical Research Methods

Differential Gene Expression in Cancer: An Overrated Analysis?

Jessica Carballido, Rocio Cecchini

Summary: This article presents a different analysis method for marker gene research, starting from the biological significance of known gene groups and evaluating the proportion of differentially expressed genes. The research findings show that the percentage of differentially expressed genes is generally low in gene sets annotated in KEGG. However, the use of differentially expressed genes consistently improves the results of statistical and machine learning models in the training and prediction process.

CURRENT BIOINFORMATICS (2022)

添加到收藏夹

Review Genetics & Heredity

RNA-seq data science: From raw data to effective interpretation

Dhrithi Deshpande, Karishma Chhugani, Yutong Chang, Aaron Karlsberg, Caitlin Loeffler, Jinyang Zhang, Agata Muszynska, Viorel Munteanu, Harry Yang, Jeremy Rotman, Laura Tao, Brunilda Balliu, Elizabeth Tseng, Eleazar Eskin, Fangqing Zhao, Pejman Mohammadi, Pawel P. Labaj, Serghei Mangul

Summary: RNA-seq has become a widely used technology in biology and clinical science due to the development of accurate computational tools by the bioinformatics community. These tools enable the analysis of large amounts of transcriptomic data and help to detect novel exons, assess gene expression, and study alternative splicing. However, it can be challenging to obtain meaningful biological signals from raw RNA-seq data due to the scale of the data and limitations of sequencing technologies. The rapid development of novel computational tools has helped to overcome these challenges and unlock the full potential of RNA-seq.

FRONTIERS IN GENETICS (2023)

添加到收藏夹

Article Biochemical Research Methods

iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects

Yunqing Liu, Jiayi Zhao, Taylor S. Adams, Ningya Wang, Jonas C. Schupp, Weimiao Wu, John E. McDonough, Geoffrey L. Chupp, Naftali Kaminski, Zuoheng Wang, Xiting Yan

Summary: iDESC is a method for analyzing single-cell RNA sequencing data that can accurately identify differentially expressed genes by considering subject effect and dropout events. The results from simulated and real datasets demonstrate its superior performance compared to existing methods.

BMC BIOINFORMATICS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Improving sequential latent variable models with autoregressive flows

Joseph Marino, Lei Chen, Jiawei He, Stephan Mandt

Summary: This approach improves sequence modeling by using autoregressive normalizing flows, which act as a moving frame of reference across time to remove temporal correlations and simplify modeling of higher-level dynamics, applicable both independently and as a component within sequential latent variable models. Results on various datasets show that the proposed method enhances log-likelihood performance over baseline models and illustrates the benefits of using flow-based dynamics in terms of decorrelation and improved generalization properties.

MACHINE LEARNING (2022)

添加到收藏夹

Review Biochemistry & Molecular Biology

The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution

Orit Rozenblatt-Rosen, Aviv Regev, Philipp Oberdoerffer, Tal Nawy, Anna Hupalowska, Jennifer E. Rood, Orr Ashenberg, Ethan Cerami, Robert J. Coffey, Emek Demir, Li Ding, Edward D. Esplin, James M. Ford, Jeremy Goecks, Sharmistha Ghosh, Joe W. Gray, Justin Guinney, Sean E. Hanlon, Shannon K. Hughes, E. Shelley Hwang, Christine A. Iacobuzio-Donahue, Judit Jane-Valbuena, Bruce E. Johnson, Ken S. Lau, Tracy Lively, Sarah A. Mazzilli, Dana Pe'er, Sandro Santagata, Alex K. Shalek, Denis Schapiro, Michael P. Snyder, Peter K. Sorger, Avrum E. Spira, Sudhir Srivastava, Kai Tan, Robert B. West, Elizabeth H. Williams

CELL (2020)

添加到收藏夹

Article Multidisciplinary Sciences

Measuring the predictability of life outcomes with a scientific mass collaboration

Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmania, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Buechi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagne, Yue Gao, Andrew Halpern-Manners, Sonia P. Hashim, Sonia Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Moeser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K. Wolters, Wei Lee Woo, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watta, Sara McLanahan

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2020)

添加到收藏夹

Article Biochemical Research Methods

A robust nonlinear low-dimensional manifold for single cell RNA-seq data

Archit Verma, Barbara E. Engelhardt

BMC BIOINFORMATICS (2020)

添加到收藏夹

Article Medical Informatics

Sparse multi-output Gaussian processes for online medical time series prediction

Li-Fang Cheng, Bianca Dumitrascu, Gregory Darnell, Corey Chivers, Michael Draugelis, Kai Li, Barbara E. Engelhardt

BMC MEDICAL INFORMATICS AND DECISION MAKING (2020)

添加到收藏夹

Article Multidisciplinary Sciences

Joint analysis of expression levels and histological images identifies genes associated with tissue morphology

Jordan T. Ash, Gregory Darnell, Daniel Munro, Barbara E. Engelhardt

Summary: The study aims to associate features of stained tissue images with high-dimensional genomic markers, identifying gene sets related to cell types and extracellular structures, and exploring how genetic variation regulates population variation in tissue morphological traits.

NATURE COMMUNICATIONS (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Optimal marker gene selection for cell type discrimination in single cell analyses

Bianca Dumitrascu, Soledad Villar, Dustin G. Mixon, Barbara E. Engelhardt

Summary: Single-cell technologies allow for characterization of complex cell populations at unprecedented scale and resolution. The method proposed in this study uses linear programming for supervised genetic marker selection and provides a Python package scGeneFit for implementation.

NATURE COMMUNICATIONS (2021)

添加到收藏夹

Article Biochemical Research Methods

Causal network inference from gene transcriptional time-series response to glucocorticoids

Jonathan Lu, Bianca Dumitrascu, Ian C. McDowell, Brian Jo, Alejandro Barrera, Linda K. Hong, Sarah M. Leichter, Timothy E. Reddy, Barbara E. Engelhardt

Summary: BETS is a method for inferring causal gene networks from gene expression time series, its efficiency and parallelization allow for quick analysis of large datasets and competitive performance in benchmark testing. Through external data validation, BETS can accurately infer activating or inhibitory causal effects.

PLOS COMPUTATIONAL BIOLOGY (2021)

添加到收藏夹

Article Multidisciplinary Sciences

A self-exciting point process to study multicellular spatial signaling patterns

Archit Verma, Siddhartha G. Jena, Danielle R. Isakov, Kazuhiro Aoki, Jared E. Toettcher, Barbara E. Engelhardt

Summary: This study proposes a spatiotemporal model of dynamic cell signaling based on Hawkes processes, which can capture both the autonomous behavior of single cells and the interactions of cells with their neighbors simultaneously. The model is applicable to tissues composed of heterogeneous cell types and can identify drug-induced signaling deficits and characterize signaling changes across different cell populations.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2021)

添加到收藏夹

Article Health Care Sciences & Services

Guiding Efficient, Effective, and Patient-Oriented Electrolyte Replacement in Critical Care: An Artificial Intelligence Reinforcement Learning Approach

Niranjani Prasad, Aishwarya Mandyam, Corey Chivers, Michael Draugelis, C. William Hanson, Barbara E. Engelhardt, Krzysztof Laudanski

Summary: This study develops a data-driven clinical decision support tool that uses reinforcement learning algorithms to recommend personalized electrolyte replacement policies for ICU patients. The tool reduces excessive electrolyte replacements, improves safety, precision, efficacy, and cost, and shows robust performance across patient cohorts and hospital systems.

JOURNAL OF PERSONALIZED MEDICINE (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

Towards 'end-to-end' analysis and understanding of biological timecourse data

Siddhartha G. Jena, Alexander G. Goglia, Barbara E. Engelhardt

Summary: Petabytes of complex live cell and tissue imaging data are generated every year, holding great promise for understanding biology. However, the current methods for analyzing and mining these data are scattered and user-specific, hindering the possibility of unified analysis across different datasets.

BIOCHEMICAL JOURNAL (2022)

添加到收藏夹

Article Biology

Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues

Ariel D. H. Gewirtz, William F. Townes, Barbara E. Engelhardt

Summary: This study presents a telescoping bimodal latent Dirichlet allocation (TBLDA) framework for learning shared topics across gene expression and genotype data, allowing for multiple RNA sequencing samples to correspond to a single individual's genotype. The TBLDA model successfully captures meaningful biological signal and identifies associations within and across tissue types. The model is able to handle nested structure datasets and uses raw sequencing count data for analysis.

LIFE SCIENCE ALLIANCE (2022)

添加到收藏夹

Article Biochemical Research Methods

Nonnegative spatial factorization applied to spatial genomics

F. William Townes, Barbara E. E. Engelhardt

Summary: Nonnegative matrix factorization (NMF) is a widely used method for analyzing high-dimensional count data, but it lacks the ability to incorporate known structure between observations. We present a new model called nonnegative spatial factorization (NSF) that addresses this limitation and achieves better accuracy and prediction performance than existing methods on spatial transcriptomics datasets. We also propose a hybrid extension of NSF that combines spatial and nonspatial components to quantify spatial importance. A TensorFlow implementation of NSF is available for researchers to use.

NATURE METHODS (2023)

添加到收藏夹

Article Biochemical Research Methods

Alignment of spatial genomics data using deep Gaussian processes

Andrew Jones, F. William Townes, Didong Li, Barbara E. Engelhardt

Summary: Gaussian Process Spatial Alignment (GPSA) aligns multiple spatially resolved genomics and histology datasets, improving downstream analysis by enabling complex spatially aware analyses that are impossible or inaccurate with unaligned data.

NATURE METHODS (2023)

添加到收藏夹

Proceedings Paper Computer Science, Interdisciplinary Applications

COP-E-CAT: Cleaning and Organization Pipeline for EHR Computational and Analytic Tasks

Aishwarya Mandyam, Elizabeth C. Yoo, Jeff Soules, Krzysztof Laudanski, Barbara E. Engelhardt

Summary: COP-E-CAT is an open-source software designed for cleaning and organizing EHR data using comparable preprocessing strategies to enhance reproducibility and comparability. It enables users to select filtering characteristics and preprocess covariates to generate data structures for downstream analysis tasks, improving EHR accessibility for a wider range of researchers.

12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021) (2021)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.