☆ 4.7 Article

Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?

MOLECULAR THERAPY-NUCLEIC ACIDS (2020)

期刊

MOLECULAR THERAPY-NUCLEIC ACIDS

卷 19, 期 -, 页码 293-303

出版社

CELL PRESS

DOI: 10.1016/j.omtn.2019.11.014

关键词

类别

Medicine, Research & Experimental

资金

Natural Science Foundation of China [61902259]
Natural Science Foundation of Guangdong Province [2018A0303130084]
ScientificResearch Foundation in Shenzhen [JCYJ20170818100431895, JCYJ20180305163701198, JCYJ20180306172207178]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Pseudouridine (Psi) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Psi sites, they are expensive and time consuming, especially the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Psi sites on uncharacterized RNA sequences. Several predictors have been exist proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identifled sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated us- ing 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 data- sets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the comoped bined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Psi identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the cor- responding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Psi modification prediction problem.

Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?

期刊

MOLECULAR THERAPY-NUCLEIC ACIDS

出版社

CELL PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?

期刊

MOLECULAR THERAPY-NUCLEIC ACIDS

出版社

CELL PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文