4.7 Article Proceedings Paper

DeepHistone: a deep learning approach to predicting histone modifications

Journal

BMC GENOMICS
Volume 20, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s12864-019-5489-4

Keywords

Histone modification; Chromatin accessibility; Deep learning; Sequence analysis; Genetic variation

Funding

  1. National Key Research and Development Program of China [2018YFC0910404]
  2. National Natural Science Foundation of China [61873141, 61721003, 61573207, U1736210, 71871019, 71471016, 31501081]
  3. Tsinghua-Fuzhou Institute for Data Technology

Ask authors/readers for more resources

MotivationQuantitative detection of histone modifications has emerged in the recent years as a major means for understanding such biological processes as chromosome packaging, transcriptional activation, and DNA damage. However, high-throughput experimental techniques such as ChIP-seq are usually expensive and time-consuming, prohibiting the establishment of a histone modification landscape for hundreds of cell types across dozens of histone markers. These disadvantages have been appealing for computational methods to complement experimental approaches towards large-scale analysis of histone modifications.ResultsWe proposed a deep learning framework to integrate sequence information and chromatin accessibility data for the accurate prediction of modification sites specific to different histone markers. Our method, named DeepHistone, outperformed several baseline methods in a series of comprehensive validation experiments, not only within an epigenome but also across epigenomes. Besides, sequence signatures automatically extracted by our method was consistent with known transcription factor binding sites, thereby giving insights into regulatory signatures of histone modifications. As an application, our method was shown to be able to distinguish functional single nucleotide polymorphisms from their nearby genetic variants, thereby having the potential to be used for exploring functional implications of putative disease-associated genetic variants.ConclusionsDeepHistone demonstrated the possibility of using a deep learning framework to integrate DNA sequence and experimental data for predicting epigenomic signals. With the state-of-the-art performance, DeepHistone was expected to shed light on a variety of epigenomic studies. DeepHistone is freely available in https://github.com/QijinYin/DeepHistone.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Microbiology

Determine independent gut microbiota-diseases association by eliminating the effects of human lifestyle factors

Congmin Zhu, Xin Wang, Jianchu Li, Rui Jiang, Hui Chen, Ting Chen, Yuqing Yang

Summary: The study reveals the significant impact of gut microbiota on human disease risk, proposing the use of machine learning models to accurately infer the associations between human variables and gut microbiota. Analysis on the American Gut Project dataset shows distinct association strengths between gut microbiota and various diseases, with significant improvement in classification performance for diseases like inflammatory bowel disease by adding gut microbiota into human variables.

BMC MICROBIOLOGY (2022)

Article Biochemistry & Molecular Biology

Highly Regional Genes: graph-based gene selection for single-cell RNA-seq data

Yanhong Wu, Qifan Hu, Shicheng Wang, Changyi Liu, Yiran Shan, Wenbo Guo, Rui Jiang, Xiaowo Wang, Jin Gu

Summary: In this study, a feature selection method called Highly Regional Genes (HRG) is proposed based on the cell-cell similarity network to identify informative genes that exhibit regional expression patterns. The HRG method demonstrates high accuracy and robustness compared to other unsupervised methods, and it improves the performance of cell clustering and gene correlation analysis.

JOURNAL OF GENETICS AND GENOMICS (2022)

Editorial Material Multidisciplinary Sciences

Toward a unified information framework for cell atlas assembly

Sijie Chen, Yanting Luo, Haoxiang Gao, Fanhong Li, Jiaqi Li, Yixin Chen, Renke You, Hairong Lv, Kui Hua, Rui Jiang, Xuegong Zhang

Summary: This perspective discusses the need and directions for the development of a unified information framework to enable the assembly of cell atlases and revolutionize medical research on the virtual body of assembled cell systems.

NATIONAL SCIENCE REVIEW (2022)

Article Biochemical Research Methods

scGraph: a graph neural network-based approach to automatically identify cell types

Qijin Yin, Qiao Liu, Zhuoran Fu, Wanwen Zeng, Boheng Zhang, Xuegong Zhang, Rui Jiang, Hairong Lv

Summary: Single-cell technologies have significantly advanced biological research, and the scGraph algorithm improves cell-type identification performance by leveraging gene interaction relationships, providing important insights into cellular characteristics.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

DualGCN: a dual graph convolutional network model to predict cancer drug response

Tianxing Ma, Qiao Liu, Haochen Li, Mu Zhou, Rui Jiang, Xuegong Zhang

Summary: Drug resistance is a major challenge in cancer therapy, and studying cancer cell lines has limitations. This article proposes a new method, DualGCN, to predict cancer drug response using a dual graph convolutional network model, which outperforms existing methods without using large-scale SNV data. This method has the potential to be applied to clinical and single-cell tumor samples, advancing precision medicine.

BMC BIOINFORMATICS (2022)

Article Genetics & Heredity

DeepCAGE: Incorporating Transcription Factors in Genome- wide Prediction of Chromatin Accessibility

Qiao Liu, Kui Hua, Xuegong Zhang, Wing Hung Wong, Rui Jiang

Summary: DeepCAGE is a deep learning framework that accurately predicts chromatin accessible regions in a variety of cell types, and exhibits superior performance in the classification and regression of chromatin accessibility signals.

GENOMICS PROTEOMICS & BIOINFORMATICS (2022)

Article Multidisciplinary Sciences

Unfolding the genotype-to-phenotype black box of cardiovascular diseases through cross-scale modeling

Xi Xi, Haochen Li, Shengquan Chen, Tingting Lv, Tianxing Ma, Rui Jiang, Ping Zhang, Wing Hung Wong, Xuegong Zhang

Summary: This article introduces a machine-learning-based cross-scale framework GRPath to decipher putative causal paths from genetic variants to disease phenotypes. Applied to cardiovascular diseases, a large number of pcPaths were identified, providing new insights into genetic variants for types of heart failure.

ISCIENCE (2022)

Article Biology

REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction

Yaowen Gu, Si Zheng, Qijin Yin, Rui Jiang, Jiao Li

Summary: Computational drug repositioning is an effective method to find new indications for existing drugs. The heterogeneous graph neural network REDDA enhances drug-disease association prediction by utilizing biological entity relations. Experimental results show that REDDA outperforms other methods, indicating its potential in drug development.

COMPUTERS IN BIOLOGY AND MEDICINE (2022)

Article Biochemistry & Molecular Biology

HiChIPdb: a comprehensive database of HiChIP regulatory interactions

Wanwen Zeng, Qiao Liu, Qijin Yin, Rui Jiang, Wing Hung Wong

Summary: HiChIPdb is a comprehensive database based on HiChIP interactions, which allows for standardized categorization and annotation of functional interactions across diverse cell types and tissues, and provides a unified pipeline for data analysis.

NUCLEIC ACIDS RESEARCH (2023)

Article Cell Biology

SINFONIA: Scalable Identification of Spatially Variable Genes for Deciphering Spatial Domains

Rui Jiang, Zhen Li, Yuhang Jia, Siyu Li, Shengquan Chen

Summary: Recent advances in spatial transcriptomics have led to a better understanding of tissue organization. In this study, we propose a scalable method called SINFONIA for identifying spatially variable genes using ensemble strategies. The method, implemented in Python, demonstrates superior performance compared to baseline methods in various evaluation metrics, and can be easily integrated into existing analysis workflows, aiding the analysis of spatial transcriptomics.

CELLS (2023)

Article Multidisciplinary Sciences

Comprehensive tissue deconvolution of cell- free DNA by deep learning for disease diagnosis and monitoring

Shuo Li, Weihua Zeng, Xiaohui Ni, Qiao Liu, Wenyuan Li, Mary L. Stackpole, Yonggang Zhou, Arjan Gower, Kostyantyn Krysan, Preeti Ahuja, David S. Lu, Steven S. Raman, William Hsu, Denise R. Aberle, Clara E. Magyar, Samuel W. French, Steven -Huy B. Han, Edward B. Garon, Vatche G. Agopian, Wing Hung Wong, Steven M. Dubinett, Xianghong Jasmine Zhoua

Summary: Plasma cell-free DNA (cfDNA) is a noninvasive biomarker that can be used to detect abnormal cell death due to diseases. By establishing a comprehensive tissue methylation atlas and utilizing the cfSort deep-learning model, the performance of tissue deconvolution in cfDNA can be significantly improved, enabling disease detection and longitudinal treatment monitoring.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2023)

Article Biotechnology & Applied Microbiology

Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG

Zhana Duren, Fengge Chang, Fnu Naqing, Jingxue Xin, Qiao Liu, Wing Hung Wong

Summary: scREG is a dimension reduction methodology for single cell multiome data based on the concept of cis-regulatory potential, used for constructing subpopulation-specific cis-regulatory networks. It demonstrates increased accuracy in inferring regulatory networks and enrichment of GWAS variants in cis-regulatory elements for specific diseases.

GENOME BIOLOGY (2022)

Article Computer Science, Artificial Intelligence

Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding

Xiaoyang Chen, Shengquan Chen, Shuang Song, Zijing Gao, Lin Hou, Xuegong Zhang, Hairong Lv, Rui Jiang

Summary: The authors propose a probabilistic generative model called EpiAnno to automatically annotate single-cell chromatin accessibility sequencing (scCAS) data. The model is validated on multiple datasets and demonstrates advantages in interpretable embedding and biological implications.

NATURE MACHINE INTELLIGENCE (2022)

No Data Available