4.8 Article

Revealing lineage-related signals in single-cell gene expression using random matrix theory

Publisher

NATL ACAD SCIENCES
DOI: 10.1073/pnas.1913931118

Keywords

single-cell data; cellular lineage; random matrix theory; spectral analysis

Funding

  1. James S. McDonnell Foundation
  2. Schmidt Futures
  3. Israel Council for Higher Education
  4. John Harvard Distinguished Science Fellows Program within the Faculty of Arts and Sciences Division of Science of Harvard University
  5. NSF [DMS-1715477]
  6. ONR [N00014-17-1-3029]
  7. Simons Foundation

Ask authors/readers for more resources

Single-cell RNA sequencing provides rich gene expression information, but random matrix theory can be used to distinguish different biological states. The study demonstrates the identification of differentiation or ancestry-related processes in single-cell data through power-law signature in covariance eigenvalue distribution.
Gene expression profiles of a cellular population, generated by single-cell RNA sequencing, contains rich information about biological state, including cell type, cell cycle phase, gene regulatory patterns, and location within the tissue of origin. A major challenge is to disentangle information about these different biological states from each other, including distinguishing from cell lineage, since the correlation of cellular expression patterns is necessarily contaminated by ancestry. Here, we use a recent advance in random matrix theory, discovered in the context of protein phylogeny, to identify differentiation or ancestry-related processes in single-cell data. Qin and Colwell [C. Qin, L. J. Colwell, Proc. Natl. Acad. Sci. U.S.A. 115, 690-695 (2018)] showed that ancestral relationships in protein sequences create a power-law signature in the covariance eigenvalue distribution. We demonstrate the existence of such signatures in scRNA-seq data and that the genes driving them are indeed related to differentiation and developmental pathways. We predict the existence of similar power-law signatures for cells along linear trajectories and demonstrate this for linearly differentiating systems. Furthermore, we generalize to show that the same signatures can arise for cells along tissue-specific spatial trajectories. We illustrate these principles in diverse tissues and organisms, including the mammalian epidermis and lung, Drosophila whole-embryo, adult Hydra, dendritic cells, the intestinal epithelium, and cells undergoing induced pluripotent stem cells (iPSC) reprogramming. We show how these results can be used to interpret the gradual dynamics of lineage structure along iPSC reprogramming. Together, we provide a framework that can be used to identify signatures of specific biological processes in single-cell data without prior knowledge and identify candidate genes associated with these processes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available