4.7 Article

DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics

期刊

出版社

MDPI
DOI: 10.3390/ijms22031399

关键词

single-cell sequencing; normalization; gene filtering; ERCC spike-ins; biomarkers; DEGs; decision trees; network analysis; Jupyter notebook; binder

资金

  1. Knut and AliceWallenberg Foundation
  2. Wallenberg Centre for molecular and translational medicine, University of Gothenburg, Sweden
  3. Swedish Cancer Society [19-0306]
  4. Swedish Research Council [2017-01392]
  5. Swedish Childhood Cancer Foundation [2017-0043, 2020-0007]
  6. Swedish government
  7. county councils, the ALF-agreement [716321]
  8. UiO: Life Science initiative

向作者/读者索取更多资源

DIscBIO is an open-source, multi-algorithmic pipeline designed to help researchers analyze cellular sub-populations at the transcriptomic level easily, efficiently, and reproducibly. The pipeline integrates multiple scRNA-seq packages for biomarker discovery and gene enrichment analysis. A cloud version is provided for training purposes.
The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据