4.6 Article Data Paper

The Lair: a resource for exploratory analysis of published RNA-Seq data

Journal

BMC BIOINFORMATICS
Volume 17, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s12859-016-1357-2

Keywords

RNA-Seq; Sequence read archive; Exploratory data analysis; Shiny; Interactive visualization; Reanalysis; Reproducibility; Kallisto; Sleuth

Funding

  1. NIH [R01 HG006129, R01 DK094699, R01 HG008164]

Ask authors/readers for more resources

Increased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is difficult to use in practice. We introduce a series of tools for processing and analyzing RNA-Seq data in the Sequence Read Archive, that together have allowed us to build an easily extendable resource for analysis of data underlying published papers. Our system makes the exploration of data easily accessible and usable without technical expertise.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biophysics

Modeling bursty transcription and splicing with the chemical master equation

Gennady Gorin, Lior Pachter

Summary: In this study, the impact of splicing cascades on expression dynamics is investigated. The authors examine a class of processes and associated distributions that arise from bursty promoters coupled to directed acyclic graphs of splicing. They provide time-dependent joint distributions for various species, offering insights into how splicing can regulate expression dynamics. The findings are validated through the analysis of long-read sequencing data.

BIOPHYSICAL JOURNAL (2022)

Article Infectious Diseases

Reconstruction of a large-scale outbreak of SARS-CoV-2 infection in Iceland informs vaccination strategies

Kristjan E. Hjorleifsson, Solvi Rognvaldsson, Hakon Jonsson, Arna B. Agustsdottir, Margret Andresdottir, Kolbrun Birgisdottir, Ogmundur Eiriksson, Elias S. Eythorsson, Run Fridriksdottir, Gudmundur Georgsson, Kjartan R. Gudmundsson, Arnaldur Gylfason, Gudbjorg Haraldsdottir, Brynjar O. Jensson, Adalbjorg Jonasdotti, Aslaug Jonasdottir, Kamilla S. Josefsdottir, Nina Kristinsdottir, Borghildur Kristjansdottir, Thordur Kristjansson, Droplaug N. Magnusdottir, Runolfur Palsson, Louise le Roux, Gudrun M. Sigurbergsdottir, Asgeir Sigurdsson, Martin Sigurdsson, Gardar Sveinbjornsson, Emil Aron Thorarensen, Bjarni Thorbjornsson, Marianna Thordardottir, Agnar Helgason, Hilma Holm, Ingileif Jonsdottir, Frosti Jonsson, Olafur T. Magnusson, Gisli Masson, Gudmundur L. Norddahl, Jona Saemundsdottir, Patrick Sulem, Unnur Thorsteinsdottir, Daniel F. Gudbjartsson, Pall Melsted, Kari Stefansson

Summary: This study reconstructs the transmission tree of the third wave of SARS-CoV-2 infections in Iceland using contact tracing and viral sequence data. The results show that individuals diagnosed outside of quarantine were 89% more infectious than those diagnosed while in quarantine. Additionally, people of working age were found to be 46% more infectious than those outside of that age range.

CLINICAL MICROBIOLOGY AND INFECTION (2022)

Article Rheumatology

Multiomics analysis of rheumatoid arthritis yields sequence variants that have large effects on risk of the seropositive subset

Saedis Saevarsdottir, Lilja Stefansdottir, Patrick Sulem, Gudmar Thorleifsson, Egil Ferkingstad, Gudrun Rutsdottir, Bente Glintborg, Helga Westerlind, Gerdur Grondal, Isabella C. Loft, Signe Bek Sorensen, Benedicte A. Lie, Mikael Brink, Lisbeth Arlestig, Asgeir Orn Arnthorsson, Eva Baecklund, Karina Banasik, Steffen Bank, Lena Bjorkman, Torkell Ellingsen, Christian Erikstrup, Oleksandr Frei, Inger Gjertsson, Daniel F. Gudbjartsson, Sigurjon A. Gudjonsson, Gisli H. Halldorsson, Oliver Hendricks, Jan Hillert, Estrid Hogdall, Soren Jacobsen, Dorte Vendelbo Jensen, Helgi Jonsson, Alf Kastbom, Ingrid Kockum, Salome Kristensen, Helga Kristjansdottir, Margit H. Larsen, Asta Linauskas, Ellen-Margrethe Hauge, Anne G. Loft, Bjorn R. Ludviksson, Sigrun H. Lund, Thorsteinn Markusson, Gisli Masson, Pall Melsted, Kristjan H. S. Moore, Heidi Munk, Kaspar R. Nielsen, Gudmundur L. Norddahl, Asmundur Oddsson, Thorunn A. Olafsdottir, Pall Olason, Tomas Olsson, Sisse Rye Ostrowski, Kim Horslev-Petersen, Solvi Rognvaldsson, Helga Sanner, Gilad N. Silberberg, Hreinn Stefansson, Erik Sorensen, Inge J. Sorensen, Carl Turesson, Thomas Bergman, Lars Alfredsson, Tore K. Kvien, Soren Brunak, Kristjan Steinsson, Vibeke Andersen, Ole A. Andreassen, Solbritt Rantapaa-Dahlqvist, Merete Lund Hetland, Lars Klareskog, Johan Askling, Leonid Padyukov, Ole Bv Pedersen, Unnur Thorsteinsdottir, Ingileif Jonsdottir, Kari Stefansson

Summary: This study identified causal genes for rheumatoid arthritis (RA) and its subsets through a genome-wide association study. Most of these genes encode proteins related to interferon and IL-12/23 signaling, particularly in the JAK/STAT pathway. Variants in some of these genes were found to increase or decrease the risk of seropositive RA.

ANNALS OF THE RHEUMATIC DISEASES (2022)

Review Biochemical Research Methods

Museum of spatial transcriptomics

Lambda Moses, Lior Pachter

Summary: This article reviews spatial transcriptomics research since 1987 and provides a detailed analysis of trends and applications of related technologies and methods. The study offers historical context and guidance for current research.

NATURE METHODS (2022)

Correction Biochemical Research Methods

Museum of spatial transcriptomics (Mar, 10.1038/s41592-022-01409-2, 2022)

Lambda Moses, Lior Pachter

NATURE METHODS (2022)

Article Multidisciplinary Sciences

The sequences of 150,119 genomes in the UK Biobank

Bjarni Halldorsson, Hannes P. Eggertsson, Kristjan H. S. Moore, Hannes Hauswedell, Ogmundur Eiriksson, Magnus O. Ulfarsson, Gunnar Palsson, Marteinn T. Hardarson, Asmundur Oddsson, Brynjar O. Jensson, Snaedis Kristmundsdottir, Brynja D. Sigurpalsdottir, Olafur A. Stefansson, Doruk Beyter, Guillaume Holley, Vinicius Tragante, Arnaldur Gylfason, Pall Olason, Florian Zink, Margret Asgeirsdottir, Sverrir T. Sverrisson, Brynjar Sigurdsson, Sigurjon A. Gudjonsson, Gunnar T. Sigurdsson, Gisli H. Halldorsson, Gardar Sveinbjornsson, Kristjan Norland, Unnur Styrkarsdottir, Droplaug N. Magnusdottir, Steinunn Snorradottir, Kari Kristinsson, Emilia Sobech, Helgi Jonsson, Arni J. Geirsson, Isleifur Olafsson, Palmi Jonsson, Ole Birger Pedersen, Christian Erikstrup, Soren Brunak, Sisse Rye Ostrowski, Gudmar Thorleifsson, Frosti Jonsson, Pall Melsted, Ingileif Jonsdottir, Thorunn Rafnar, Hilma Holm, Hreinn Stefansson, Jona Saemundsdottir, Daniel F. Gudbjartsson, Olafur T. Magnusson, Gisli Masson, Unnur Thorsteinsdottir, Agnar Helgason, Hakon Jonsson, Patrick Sulem, Kari Stefansson

Summary: A comprehensive understanding of how diversity in the human genome sequence affects phenotypic diversity relies on a reliable characterization of both sequences and phenotypic variation. In this study, whole-genome sequencing of 150,119 individuals from the UK Biobank was performed, leading to insights into the relationship between sequence variation and phenotypic traits. The analysis revealed rare variants with large effects, which were not previously identified through whole-exome sequencing and/or imputation studies.

NATURE (2022)

Article Genetics & Heredity

Multiomics study of nonalcoholic fatty liver disease

Gardar Sveinbjornsson, Magnus O. Ulfarsson, Rosa B. Thorolfsdottir, Benedikt A. Jonsson, Eythor Einarsson, Gylfi Gunnlaugsson, Solvi Rognvaldsson, David O. Arnar, Magnus Baldvinsson, Ragnar G. Bjarnason, Thjodbjorg Eiriksdottir, Christian Erikstrup, Egil Ferkingstad, Gisli H. Halldorsson, Hannes Helgason, Anna Helgadottir, Lotte Hindhede, Grimur Hjorleifsson, David Jones, Kirk U. Knowlton, Sigrun H. Lund, Pall Melsted, Kristjan Norland, Isleifur Olafsson, Sigurdur Olafsson, Gudjon R. Oskarsson, Sisse Rye Ostrowski, Ole Birger Pedersen, Audunn S. Snaebjarnarson, Emil Sigurdsson, Valgerdur Steinthorsdottir, Michael Schwinn, Gudmundur Thorgeirsson, Gudmar Thorleifsson, Ingileif Jonsdottir, Henning Bundgaard, Lincoln Nadauld, Einar S. Bjornsson, Ingrid C. Rulifson, Thorunn Rafnar, Gudmundur L. Norddahl, Unnur Thorsteinsdottir, Patrick Sulem, Daniel F. Gudbjartsson, Hilma Holm, Kari Stefansson

Summary: This study identified genetic variants, genes, and proteins associated with nonalcoholic fatty liver, cirrhosis, and hepatocellular carcinoma. The findings provide insights into the development of noninvasive evaluation and new therapeutic options for NAFL. Proteomics can also distinguish between NAFL and cirrhosis.

NATURE GENETICS (2022)

Article Multidisciplinary Sciences

Effect of booster vaccination against Delta and Omicron SARS-CoV-2 variants in Iceland

Gudmundur L. Norddahl, Pall Melsted, Kristbjorg Gunnarsdottir, Gisli H. Halldorsson, Thorunn A. Olafsdottir, Arnaldur Gylfason, Mar Kristjansson, Olafur T. Magnusson, Patrick Sulem, Daniel F. Gudbjartsson, Unnur Thorsteinsdottir, Ingileif Jonsdottir, Kari Stefansson

Summary: This study examines the immune responses elicited by different initial/booster vaccine combinations in Iceland and evaluates the effects of booster doses against Delta and Omicron infections. The findings suggest that mRNA boosters provide increased protection against both variants compared to other vaccine combinations.

NATURE COMMUNICATIONS (2022)

Article Biochemical Research Methods

RNA velocity unraveled

Gennady Gorin, Meichen Fang, Tara Chari, Lior Pachter

Summary: We perform a thorough analysis of RNA velocity methods and propose an improved framework.

PLOS COMPUTATIONAL BIOLOGY (2022)

Article Multidisciplinary Sciences

Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experiments

Gennady Gorin, John J. Vastola, Meichen Fang, Lior Pachter

Summary: This study investigates how cell-to-cell differences in transcription rate affect RNA count distributions. The authors introduce quantitative models to compare and contrast two biologically plausible hypotheses about transcription rate variation. They propose a framework for analyzing these models and use Bayesian model selection to identify candidate genes in single-cell transcriptomic data.

NATURE COMMUNICATIONS (2022)

Correction Multidisciplinary Sciences

Principles of open source bioinstrumentation applied to the poseidon syringe pump system (vol 9, 12385, 2019)

A. Sina Booeshaghi, Eduardo da Veiga Beltrame, Dylan Bannon, Jase Gehring, Lior Pachter

SCIENTIFIC REPORTS (2023)

Article Biochemical Research Methods

The specious art of single-cell genomics

Tara Chari, Lior Pachter

Summary: Dimensionality reduction is a common method for filtering noise and identifying relevant features in large-scale data analysis. However, reducing high-dimensional datasets to just 2 or 3 dimensions can lead to significant distortion. This study shows that low-dimensional embedding of single-cell data can be counter-productive for exploratory biological analysis due to extensive distortions and inconsistent practices. Alternative approaches for targeted embedding and feature exploration are discussed to enable hypothesis-driven biological discovery.

PLOS COMPUTATIONAL BIOLOGY (2023)

Article Biochemistry & Molecular Biology

Studying stochastic systems biology of the cell with single-cell genomics data

Gennady Gorin, John J. Vastola, Lior Pachter

Summary: Recent experimental developments in genome-wide RNA quantification show great potential for systems biology. However, a unified mathematical framework is needed to comprehensively study the biology of living cells, taking into account technical variations in genomics assays and the stochasticity of single-molecule biology. In this paper, we review different models for RNA transcription processes and present a framework that integrates these phenomena through the manipulation of generating functions. Finally, we provide simulated scenarios and biological data to demonstrate the implications and applications of this approach.

CELL SYSTEMS (2023)

Article Biophysics

Length biases in single-cell RNA sequencing of pre-mRNA

Gennady Gorin, Lior Pachter

Summary: Single-cell RNA sequencing data can be modeled using Markov chains to gain genome-wide insights into transcriptional physics. However, accurate analysis of the data requires careful consideration of noise sources. A length-based model of capture bias is proposed to explain the over-representation of long pre-mRNA transcripts in sequencing data, which may lead to false-positive observations. This model provides concordant parameter trends and helps identify systematic, mechanistically interpretable technical and biological differences in paired data sets.

BIOPHYSICAL REPORTS (2023)

Article Biochemical Research Methods

BUSZ: compressed BUS files

Petur Helgi Einarsson, Pall Melsted

Summary: We propose a compression scheme and its implementation in BUStools software for BUS files. Our algorithm achieves smaller file sizes and faster compression and decompression speeds compared to gzip. We evaluated our algorithm on 533 BUS files from scRNA-seq experiments, resulting in a 8.3x reduction in file size and a compressed dataset size of 122GB.

BIOINFORMATICS (2023)

No Data Available