4.4 Article

CONTRASTIVE LATENT VARIABLE MODELING WITH APPLICATION TO CASE-CONTROL SEQUENCING EXPERIMENTS

期刊

ANNALS OF APPLIED STATISTICS
卷 16, 期 3, 页码 1268-1291

出版社

INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/21-AOAS1534

关键词

Latent variable models; RNA sequencing; differential expression; contrastive models; case-control data

资金

  1. Helmsley Trust
  2. NIH NHLBI [R01 HL133218]
  3. NSF CAREER [AWD1005627]
  4. NIH Human Tumor Atlas Research Program

向作者/读者索取更多资源

High-throughput RNA-sequencing technologies are powerful for understanding cellular state, but common methods ignore changes in transcriptional correlation. The proposed contrastive latent variable models aim to create a richer portrait of differential expression and identify the low-dimensional structure of gene expression shift.
High-throughput RNA-sequencing (RNA-seq) technologies are powerful tools for understanding cellular state. Often, it is of interest to quantify and to summarize changes in cell state that occur between experimental or biological conditions. Differential expression is typically assessed using univariate tests to measure genewise shifts in expression. However, these methods largely ignore changes in transcriptional correlation. Furthermore, there is a need to identify the low-dimensional structure of the gene expression shift to identify collections of genes that change between conditions. Here, we propose contrastive latent variable models designed for count data to create a richer portrait of differential expression in sequencing data. These models disentangle the sources of transcriptional variation in different conditions in the context of an explicit model of variation at baseline. Moreover, we develop a model-based hypothesis testing framework that can test for global and gene subset-specific changes in expression. We evaluate our model through extensive simulations and analyses with count-based gene expression data from perturbation and observational sequencing experiments. We find that our methods effectively summarize and quantify complex transcriptional changes in case-control experimental sequencing data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Review Biochemistry & Molecular Biology

The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution

Orit Rozenblatt-Rosen, Aviv Regev, Philipp Oberdoerffer, Tal Nawy, Anna Hupalowska, Jennifer E. Rood, Orr Ashenberg, Ethan Cerami, Robert J. Coffey, Emek Demir, Li Ding, Edward D. Esplin, James M. Ford, Jeremy Goecks, Sharmistha Ghosh, Joe W. Gray, Justin Guinney, Sean E. Hanlon, Shannon K. Hughes, E. Shelley Hwang, Christine A. Iacobuzio-Donahue, Judit Jane-Valbuena, Bruce E. Johnson, Ken S. Lau, Tracy Lively, Sarah A. Mazzilli, Dana Pe'er, Sandro Santagata, Alex K. Shalek, Denis Schapiro, Michael P. Snyder, Peter K. Sorger, Avrum E. Spira, Sudhir Srivastava, Kai Tan, Robert B. West, Elizabeth H. Williams

Article Multidisciplinary Sciences

Measuring the predictability of life outcomes with a scientific mass collaboration

Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmania, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Buechi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagne, Yue Gao, Andrew Halpern-Manners, Sonia P. Hashim, Sonia Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Moeser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K. Wolters, Wei Lee Woo, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watta, Sara McLanahan

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2020)

Article Biochemical Research Methods

A robust nonlinear low-dimensional manifold for single cell RNA-seq data

Archit Verma, Barbara E. Engelhardt

BMC BIOINFORMATICS (2020)

Article Medical Informatics

Sparse multi-output Gaussian processes for online medical time series prediction

Li-Fang Cheng, Bianca Dumitrascu, Gregory Darnell, Corey Chivers, Michael Draugelis, Kai Li, Barbara E. Engelhardt

BMC MEDICAL INFORMATICS AND DECISION MAKING (2020)

Article Multidisciplinary Sciences

Joint analysis of expression levels and histological images identifies genes associated with tissue morphology

Jordan T. Ash, Gregory Darnell, Daniel Munro, Barbara E. Engelhardt

Summary: The study aims to associate features of stained tissue images with high-dimensional genomic markers, identifying gene sets related to cell types and extracellular structures, and exploring how genetic variation regulates population variation in tissue morphological traits.

NATURE COMMUNICATIONS (2021)

Article Multidisciplinary Sciences

Optimal marker gene selection for cell type discrimination in single cell analyses

Bianca Dumitrascu, Soledad Villar, Dustin G. Mixon, Barbara E. Engelhardt

Summary: Single-cell technologies allow for characterization of complex cell populations at unprecedented scale and resolution. The method proposed in this study uses linear programming for supervised genetic marker selection and provides a Python package scGeneFit for implementation.

NATURE COMMUNICATIONS (2021)

Article Biochemical Research Methods

Causal network inference from gene transcriptional time-series response to glucocorticoids

Jonathan Lu, Bianca Dumitrascu, Ian C. McDowell, Brian Jo, Alejandro Barrera, Linda K. Hong, Sarah M. Leichter, Timothy E. Reddy, Barbara E. Engelhardt

Summary: BETS is a method for inferring causal gene networks from gene expression time series, its efficiency and parallelization allow for quick analysis of large datasets and competitive performance in benchmark testing. Through external data validation, BETS can accurately infer activating or inhibitory causal effects.

PLOS COMPUTATIONAL BIOLOGY (2021)

Article Multidisciplinary Sciences

A self-exciting point process to study multicellular spatial signaling patterns

Archit Verma, Siddhartha G. Jena, Danielle R. Isakov, Kazuhiro Aoki, Jared E. Toettcher, Barbara E. Engelhardt

Summary: This study proposes a spatiotemporal model of dynamic cell signaling based on Hawkes processes, which can capture both the autonomous behavior of single cells and the interactions of cells with their neighbors simultaneously. The model is applicable to tissues composed of heterogeneous cell types and can identify drug-induced signaling deficits and characterize signaling changes across different cell populations.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2021)

Article Health Care Sciences & Services

Guiding Efficient, Effective, and Patient-Oriented Electrolyte Replacement in Critical Care: An Artificial Intelligence Reinforcement Learning Approach

Niranjani Prasad, Aishwarya Mandyam, Corey Chivers, Michael Draugelis, C. William Hanson, Barbara E. Engelhardt, Krzysztof Laudanski

Summary: This study develops a data-driven clinical decision support tool that uses reinforcement learning algorithms to recommend personalized electrolyte replacement policies for ICU patients. The tool reduces excessive electrolyte replacements, improves safety, precision, efficacy, and cost, and shows robust performance across patient cohorts and hospital systems.

JOURNAL OF PERSONALIZED MEDICINE (2022)

Article Biochemistry & Molecular Biology

Towards 'end-to-end' analysis and understanding of biological timecourse data

Siddhartha G. Jena, Alexander G. Goglia, Barbara E. Engelhardt

Summary: Petabytes of complex live cell and tissue imaging data are generated every year, holding great promise for understanding biology. However, the current methods for analyzing and mining these data are scattered and user-specific, hindering the possibility of unified analysis across different datasets.

BIOCHEMICAL JOURNAL (2022)

Article Biology

Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues

Ariel D. H. Gewirtz, William F. Townes, Barbara E. Engelhardt

Summary: This study presents a telescoping bimodal latent Dirichlet allocation (TBLDA) framework for learning shared topics across gene expression and genotype data, allowing for multiple RNA sequencing samples to correspond to a single individual's genotype. The TBLDA model successfully captures meaningful biological signal and identifies associations within and across tissue types. The model is able to handle nested structure datasets and uses raw sequencing count data for analysis.

LIFE SCIENCE ALLIANCE (2022)

Article Biochemical Research Methods

Nonnegative spatial factorization applied to spatial genomics

F. William Townes, Barbara E. E. Engelhardt

Summary: Nonnegative matrix factorization (NMF) is a widely used method for analyzing high-dimensional count data, but it lacks the ability to incorporate known structure between observations. We present a new model called nonnegative spatial factorization (NSF) that addresses this limitation and achieves better accuracy and prediction performance than existing methods on spatial transcriptomics datasets. We also propose a hybrid extension of NSF that combines spatial and nonspatial components to quantify spatial importance. A TensorFlow implementation of NSF is available for researchers to use.

NATURE METHODS (2023)

Article Biochemical Research Methods

Alignment of spatial genomics data using deep Gaussian processes

Andrew Jones, F. William Townes, Didong Li, Barbara E. Engelhardt

Summary: Gaussian Process Spatial Alignment (GPSA) aligns multiple spatially resolved genomics and histology datasets, improving downstream analysis by enabling complex spatially aware analyses that are impossible or inaccurate with unaligned data.

NATURE METHODS (2023)

Proceedings Paper Computer Science, Interdisciplinary Applications

COP-E-CAT: Cleaning and Organization Pipeline for EHR Computational and Analytic Tasks

Aishwarya Mandyam, Elizabeth C. Yoo, Jeff Soules, Krzysztof Laudanski, Barbara E. Engelhardt

Summary: COP-E-CAT is an open-source software designed for cleaning and organizing EHR data using comparable preprocessing strategies to enhance reproducibility and comparability. It enables users to select filtering characteristics and preprocess covariates to generate data structures for downstream analysis tasks, improving EHR accessibility for a wider range of researchers.

12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021) (2021)

暂无数据