4.7 Article

Detection of transcription factors binding to methylated DNA by deep recurrent neural network

期刊

BRIEFINGS IN BIOINFORMATICS
卷 23, 期 1, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbab533

关键词

transcription factors; methylated DNA; deep recurrent neural network; tripeptide word vector; tripeptide

资金

  1. National Natural Science Foundation of China [61771165, 62072095]
  2. National Key R&D Program of China [2021YFC 2100100]
  3. Innovation Project of State Key Laboratory of Tree Genetics and Breeding (Northeast Forestry University) [2019A04]

向作者/读者索取更多资源

Transcription factors (TFs) are proteins involved in gene expression regulation, and recent studies have shown that some TFs can interact with methylated DNA fragments. This study presents a machine learning-based approach to quickly identify TFs that can bind to methylated DNA. The proposed model, based on tripeptide word vector feature and recurrent neural network, achieves high accuracy in predicting TFs and their binding to methylated DNA.
Transcription factors (TFs) are proteins specifically involved in gene expression regulation. It is generally accepted in epigenetics that methylated nucleotides could prevent the TFs from binding to DNA fragments. However, recent studies have confirmed that some TFs have capability to interact with methylated DNA fragments to further regulate gene expression. Although biochemical experiments could recognize TFs binding to methylated DNA sequences, these wet experimental methods are time-consuming and expensive. Machine learning methods provide a good choice for quickly identifying these TFs without experimental materials. Thus, this study aims to design a robust predictor to detect methylated DNA-bound TFs. We firstly proposed using tripeptide word vector feature to formulate protein samples. Subsequently, based on recurrent neural network with long short-term memory, a two-step computational model was designed. The first step predictor was utilized to discriminate transcription factors from non-transcription factors. Once proteins were predicted as TFs, the second step predictor was employed to judge whether the TFs can bind to methylated DNA. Through the independent dataset test, the accuracies of the first step and the second step are 86.63% and 73.59%, respectively. In addition, the statistical analysis of the distribution of tripeptides in training samples showed that the position and number of some tripeptides in the sequence could affect the binding of TFs to methylated DNA. Finally, on the basis of our model, a free web server was established based on the proposed model, which can be available at https://bioinfor.nefu.edu.cn/TFPM/.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据