4.7 Article

dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost

期刊

METHODS
卷 204, 期 -, 页码 215-222

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ymeth.2022.01.001

关键词

Promoters; XGBoost; K-mer word vector; PseKNC; Binary; PseDNC

资金

  1. National Natural Science Foundation of China [61971119, 62072095, 62101353]
  2. Special Science Foundation of Quzhou [2021D004]

向作者/读者索取更多资源

Promoters play a crucial role in biological processes and genetics, and detecting and distinguishing different promoters is important for understanding gene expression and specific disease mechanisms. However, existing computational models are limited in their ability to match the speed of sequencing. To address this, we propose a computational model based on multiple descriptors and feature selection.
Promoters play an irreplaceable role in biological processes and genetics, which are responsible for stimulating the transcription and expression of specific genes. Promoter abnormalities have been found in some diseases, and the level of promoter-binding transcription factors can be used as a marker before a disease occurs. Hence, detecting promoters from DNA sequences has important biological significance, particular, distinguishing strong promoters can help to elucidate differences in gene expression and the mechanisms of specific diseases. With the introduction of third-generation sequencing, it is difficult to match the speed of sequencing to the speed of labeling promoters experimentally. Many computing models have been designed to fill this gap and identify unlabeled DNA. However, their feature representation methods are very singular, which cannot reflect the information contained in the original samples. With the aim of avoiding information loss, we propose a computational model based on multiple descriptors and feature selection to jointly express samples. It is worth mentioning that a new feature descriptor called K-mer word vector is defined. The promoter model of multiple feature descriptors dominated by K-mer word vector achieves similar performance to existing methods, the sensitivity of 85.72% can distinguish the promoter more effectively than other methods. Furthermore, the performance of the promoter strength has surpassed published methods, and accuracy of 77.00% greatly improves the ability to distinguish between strong and weak promoters.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据