4.6 Article

ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation

期刊

PLOS COMPUTATIONAL BIOLOGY
卷 17, 期 9, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1009376

关键词

-

向作者/读者索取更多资源

Accurate identification of regulatory elements like promoters and enhancers is crucial for understanding gene expression patterns. While many attempts have been made to develop computational methods, reliable tools for analyzing long genomic sequences are still lacking. To address this issue, the authors propose a dynamic negative set updating scheme and use a two-model approach, achieving good performance at the genome level.
Author summary Identification of regulatory elements (promoters and enhancers) is important for understanding gene expression patterns. The set of promoters and enhancers is not complete for non-model organisms and even for the human genome there are still unannotated regions, such as alternative promoters for the known genes or promoters that are only expressed in a small fraction of cells or under specific conditions. Despite the development of experimental techniques, the regulatory regions annotation remains expensive and laborious and computational methods can speed up this process by providing candidates for the validation. We developed an easy-to-use tool capable of regulatory regions annotation in eukaryotic genomes. The developed method reduces the number of false positives made by including difficult samples in the training set. The method consists of two deep learning models, where one model scans the genome and identifies putative regulatory regions while the other model pinpoints the Transcription Start Site (TSS) location within the identified region. The predicted regions were validated using reporter assay, finding previously unknown regulatory regions in the human genome. The trained model achieved good genome-wide performance and was supported by meaningful extracted biological features. Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring false positive predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据