Article
Biochemical Research Methods
Sini Junttila, Johannes Smolander, Laura L. Elo
Summary: This study compared 18 methods for identifying differential states (DS) changes between conditions in multisubject scRNA-seq data, and found that pseudobulk methods and mixed models performed best, showing superior statistical performance compared to naive single-cell methods.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Mathematical & Computational Biology
Himel Mallick, Suvo Chatterjee, Shrabanti Chowdhury, Saptarshi Chatterjee, Ali Rahnavard, Stephanie C. Hicks
Summary: The performance of computational methods and software for identifying differentially expressed features in single-cell RNA sequencing is affected by normalization methods and the choice of experimental platform. The study introduces a Tweedie generalized linear model to model the technological variability in cross-platform scRNA-seq data, resulting in improved statistical power and false discovery rate control.
STATISTICS IN MEDICINE
(2022)
Article
Biochemical Research Methods
Zerun Lin, Le Ou-Yang
Summary: In this paper, a multi-view contrastive learning model (DeepMCL) is proposed to infer gene regulatory networks (GRNs) from single-cell RNA-seq (scRNA-seq) data collected from multiple data sources or time points. The experimental results validate the effectiveness of our contrastive learning and attention mechanisms, demonstrating the effectiveness of our model in integrating multiple data sources for GRN inference.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Zerun Lin, Le Ou-Yang
Summary: The paper proposes a multi-view contrastive learning model, DeepMCL, for inferring gene regulatory networks from scRNA-seq data. By utilizing a deep Siamese convolutional neural network and attention mechanism, the model integrates information from multiple data sources, improving the accuracy of network inference.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Biotechnology & Applied Microbiology
Emma Dann, Neil C. Henderson, Sarah A. Teichmann, Michael D. Morgan, John C. Marioni
Summary: Milo is a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. It can identify perturbations obscured by discretizing cells into clusters and outperforms alternative testing strategies. Milo is based on cell-cell similarity structure and may be applicable to various single-cell data beyond scRNA-seq.
NATURE BIOTECHNOLOGY
(2022)
Article
Health Care Sciences & Services
Jiong Chen, Xinlei Mi, Jing Ning, Xuming He, Jianhua Hu
Summary: RNA sequencing data are widely used in biomedical research for biomarker discovery. A tail-based test, derived from quantile regression, has been proposed to compare groups in terms of a specific distribution area instead of a single location. Monte Carlo simulation studies show that this test is generally more powerful and robust in detecting differential expression compared to commonly used tests based on the mean or a single quantile.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2021)
Article
Multidisciplinary Sciences
Balazs Gocz, Eva Rumpler, Miklos Sarvari, Katalin Skrapits, Szabolcs Takacs, Imre Farkas, Veronika Csillag, Sarolta H. Trinh, Zsuzsanna Bardoczi, Yvette Ruska, Norbert Solymosi, Szilard Poliska, Zsuzsanna Szoke, Lucia Bartoloni, Yassine Zouaghi, Andrea Messina, Nelly Pitteloud, Ross C. Anderson, Robert P. Millar, Richard Quinton, Stephen M. Manchishi, William H. Colledge, Erik Hrabovszky
Summary: This study provides a comprehensive characterization of the estrogen-dependent kisspeptin neuron transcriptome and sheds light on the molecular mechanisms of ovary-brain communication, which is important for genetic research on human fertility disorders.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2022)
Article
Biology
S. Na, M. Kolar, O. Koyejo
Summary: Differential graphical models seek to represent differences in conditional dependence structures between two groups, with an extended setting considered in this manuscript involving latent variable Gaussian graphical models. The proposed two-stage estimation method decomposes the differential network into sparse and low-rank components, demonstrating superior performance in experiments compared to existing methods.
Article
Genetics & Heredity
Tomasz Kujawa, Michal Marczyk, Joanna Polanska
Summary: Large-scale comprehensive single-cell experiments often lead to batch effects, which need to be corrected during data integration. Data integration is challenging due to the overlapping of biological and technical factors. The choice of integration method is crucial for downstream analysis.
FRONTIERS IN GENETICS
(2022)
Article
Biochemical Research Methods
Lisa M. Lix, Olawale Ayilara
Summary: This article aims to review the characteristics and applications of latent variable mixture models (LVMMs) in patient-reported outcome (PRO) data, and provide a demonstration of their use. LVMMs can be used to identify homogeneous sub-groups within a study population based on observed patterns of responses in PRO data. The article focuses on mixture item response theory (IRT) models, which combine latent class analysis with the conventional IRT model. An illustrative example is presented using clinical registry data.
Review
Psychology, Multidisciplinary
Selena Wang
Summary: This paper introduces latent variable network models and their integration with psychometric models, summarizing developments under network psychometrics and distinguishing graphical models under this framework from other network models. Each model is introduced using unified notations, with all methods accompanied by available R packages for further independent learning.
FRONTIERS IN PSYCHOLOGY
(2021)
Article
Biochemical Research Methods
Jessica Carballido, Rocio Cecchini
Summary: This article presents a different analysis method for marker gene research, starting from the biological significance of known gene groups and evaluating the proportion of differentially expressed genes. The research findings show that the percentage of differentially expressed genes is generally low in gene sets annotated in KEGG. However, the use of differentially expressed genes consistently improves the results of statistical and machine learning models in the training and prediction process.
CURRENT BIOINFORMATICS
(2022)
Review
Genetics & Heredity
Dhrithi Deshpande, Karishma Chhugani, Yutong Chang, Aaron Karlsberg, Caitlin Loeffler, Jinyang Zhang, Agata Muszynska, Viorel Munteanu, Harry Yang, Jeremy Rotman, Laura Tao, Brunilda Balliu, Elizabeth Tseng, Eleazar Eskin, Fangqing Zhao, Pejman Mohammadi, Pawel P. Labaj, Serghei Mangul
Summary: RNA-seq has become a widely used technology in biology and clinical science due to the development of accurate computational tools by the bioinformatics community. These tools enable the analysis of large amounts of transcriptomic data and help to detect novel exons, assess gene expression, and study alternative splicing. However, it can be challenging to obtain meaningful biological signals from raw RNA-seq data due to the scale of the data and limitations of sequencing technologies. The rapid development of novel computational tools has helped to overcome these challenges and unlock the full potential of RNA-seq.
FRONTIERS IN GENETICS
(2023)
Article
Biochemical Research Methods
Yunqing Liu, Jiayi Zhao, Taylor S. Adams, Ningya Wang, Jonas C. Schupp, Weimiao Wu, John E. McDonough, Geoffrey L. Chupp, Naftali Kaminski, Zuoheng Wang, Xiting Yan
Summary: iDESC is a method for analyzing single-cell RNA sequencing data that can accurately identify differentially expressed genes by considering subject effect and dropout events. The results from simulated and real datasets demonstrate its superior performance compared to existing methods.
BMC BIOINFORMATICS
(2023)
Article
Computer Science, Artificial Intelligence
Joseph Marino, Lei Chen, Jiawei He, Stephan Mandt
Summary: This approach improves sequence modeling by using autoregressive normalizing flows, which act as a moving frame of reference across time to remove temporal correlations and simplify modeling of higher-level dynamics, applicable both independently and as a component within sequential latent variable models. Results on various datasets show that the proposed method enhances log-likelihood performance over baseline models and illustrates the benefits of using flow-based dynamics in terms of decorrelation and improved generalization properties.
Review
Biochemistry & Molecular Biology
Orit Rozenblatt-Rosen, Aviv Regev, Philipp Oberdoerffer, Tal Nawy, Anna Hupalowska, Jennifer E. Rood, Orr Ashenberg, Ethan Cerami, Robert J. Coffey, Emek Demir, Li Ding, Edward D. Esplin, James M. Ford, Jeremy Goecks, Sharmistha Ghosh, Joe W. Gray, Justin Guinney, Sean E. Hanlon, Shannon K. Hughes, E. Shelley Hwang, Christine A. Iacobuzio-Donahue, Judit Jane-Valbuena, Bruce E. Johnson, Ken S. Lau, Tracy Lively, Sarah A. Mazzilli, Dana Pe'er, Sandro Santagata, Alex K. Shalek, Denis Schapiro, Michael P. Snyder, Peter K. Sorger, Avrum E. Spira, Sudhir Srivastava, Kai Tan, Robert B. West, Elizabeth H. Williams
Article
Multidisciplinary Sciences
Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmania, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Buechi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagne, Yue Gao, Andrew Halpern-Manners, Sonia P. Hashim, Sonia Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Moeser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K. Wolters, Wei Lee Woo, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watta, Sara McLanahan
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2020)
Article
Biochemical Research Methods
Archit Verma, Barbara E. Engelhardt
BMC BIOINFORMATICS
(2020)
Article
Medical Informatics
Li-Fang Cheng, Bianca Dumitrascu, Gregory Darnell, Corey Chivers, Michael Draugelis, Kai Li, Barbara E. Engelhardt
BMC MEDICAL INFORMATICS AND DECISION MAKING
(2020)
Article
Multidisciplinary Sciences
Jordan T. Ash, Gregory Darnell, Daniel Munro, Barbara E. Engelhardt
Summary: The study aims to associate features of stained tissue images with high-dimensional genomic markers, identifying gene sets related to cell types and extracellular structures, and exploring how genetic variation regulates population variation in tissue morphological traits.
NATURE COMMUNICATIONS
(2021)
Article
Multidisciplinary Sciences
Bianca Dumitrascu, Soledad Villar, Dustin G. Mixon, Barbara E. Engelhardt
Summary: Single-cell technologies allow for characterization of complex cell populations at unprecedented scale and resolution. The method proposed in this study uses linear programming for supervised genetic marker selection and provides a Python package scGeneFit for implementation.
NATURE COMMUNICATIONS
(2021)
Article
Biochemical Research Methods
Jonathan Lu, Bianca Dumitrascu, Ian C. McDowell, Brian Jo, Alejandro Barrera, Linda K. Hong, Sarah M. Leichter, Timothy E. Reddy, Barbara E. Engelhardt
Summary: BETS is a method for inferring causal gene networks from gene expression time series, its efficiency and parallelization allow for quick analysis of large datasets and competitive performance in benchmark testing. Through external data validation, BETS can accurately infer activating or inhibitory causal effects.
PLOS COMPUTATIONAL BIOLOGY
(2021)
Article
Multidisciplinary Sciences
Archit Verma, Siddhartha G. Jena, Danielle R. Isakov, Kazuhiro Aoki, Jared E. Toettcher, Barbara E. Engelhardt
Summary: This study proposes a spatiotemporal model of dynamic cell signaling based on Hawkes processes, which can capture both the autonomous behavior of single cells and the interactions of cells with their neighbors simultaneously. The model is applicable to tissues composed of heterogeneous cell types and can identify drug-induced signaling deficits and characterize signaling changes across different cell populations.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2021)
Article
Health Care Sciences & Services
Niranjani Prasad, Aishwarya Mandyam, Corey Chivers, Michael Draugelis, C. William Hanson, Barbara E. Engelhardt, Krzysztof Laudanski
Summary: This study develops a data-driven clinical decision support tool that uses reinforcement learning algorithms to recommend personalized electrolyte replacement policies for ICU patients. The tool reduces excessive electrolyte replacements, improves safety, precision, efficacy, and cost, and shows robust performance across patient cohorts and hospital systems.
JOURNAL OF PERSONALIZED MEDICINE
(2022)
Article
Biochemistry & Molecular Biology
Siddhartha G. Jena, Alexander G. Goglia, Barbara E. Engelhardt
Summary: Petabytes of complex live cell and tissue imaging data are generated every year, holding great promise for understanding biology. However, the current methods for analyzing and mining these data are scattered and user-specific, hindering the possibility of unified analysis across different datasets.
BIOCHEMICAL JOURNAL
(2022)
Article
Biology
Ariel D. H. Gewirtz, William F. Townes, Barbara E. Engelhardt
Summary: This study presents a telescoping bimodal latent Dirichlet allocation (TBLDA) framework for learning shared topics across gene expression and genotype data, allowing for multiple RNA sequencing samples to correspond to a single individual's genotype. The TBLDA model successfully captures meaningful biological signal and identifies associations within and across tissue types. The model is able to handle nested structure datasets and uses raw sequencing count data for analysis.
LIFE SCIENCE ALLIANCE
(2022)
Article
Biochemical Research Methods
F. William Townes, Barbara E. E. Engelhardt
Summary: Nonnegative matrix factorization (NMF) is a widely used method for analyzing high-dimensional count data, but it lacks the ability to incorporate known structure between observations. We present a new model called nonnegative spatial factorization (NSF) that addresses this limitation and achieves better accuracy and prediction performance than existing methods on spatial transcriptomics datasets. We also propose a hybrid extension of NSF that combines spatial and nonspatial components to quantify spatial importance. A TensorFlow implementation of NSF is available for researchers to use.
Article
Biochemical Research Methods
Andrew Jones, F. William Townes, Didong Li, Barbara E. Engelhardt
Summary: Gaussian Process Spatial Alignment (GPSA) aligns multiple spatially resolved genomics and histology datasets, improving downstream analysis by enabling complex spatially aware analyses that are impossible or inaccurate with unaligned data.
Proceedings Paper
Computer Science, Interdisciplinary Applications
Aishwarya Mandyam, Elizabeth C. Yoo, Jeff Soules, Krzysztof Laudanski, Barbara E. Engelhardt
Summary: COP-E-CAT is an open-source software designed for cleaning and organizing EHR data using comparable preprocessing strategies to enhance reproducibility and comparability. It enables users to select filtering characteristics and preprocess covariates to generate data structures for downstream analysis tasks, improving EHR accessibility for a wider range of researchers.
12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021)
(2021)