4.6 Article

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

期刊

PLOS ONE
卷 11, 期 4, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0153268

关键词

-

资金

  1. National Natural Science Foundation of China [61271337, 61103126, 61572368]
  2. Shenzhen Development Foundation [JCYJ20130401160028781]
  3. China Scholarship Council [201406275015]
  4. Natural Science Foundation of Hubei Province, China [ZRY2014000901]

向作者/读者索取更多资源

Background Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. Methods In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposonderived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. Results We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. Conclusions Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据