4.7 Article

Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?

期刊

MOLECULAR THERAPY-NUCLEIC ACIDS
卷 19, 期 -, 页码 293-303

出版社

CELL PRESS
DOI: 10.1016/j.omtn.2019.11.014

关键词

-

资金

  1. Natural Science Foundation of China [61902259]
  2. Natural Science Foundation of Guangdong Province [2018A0303130084]
  3. ScientificResearch Foundation in Shenzhen [JCYJ20170818100431895, JCYJ20180305163701198, JCYJ20180306172207178]

向作者/读者索取更多资源

Pseudouridine (Psi) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Psi sites, they are expensive and time consuming, especially the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Psi sites on uncharacterized RNA sequences. Several predictors have been exist proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identifled sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated us- ing 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 data- sets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the comoped bined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Psi identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the cor- responding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Psi modification prediction problem.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据