4.7 Article

i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation

Journal

PLANT MOLECULAR BIOLOGY
Volume 103, Issue 1-2, Pages 225-234

Publisher

SPRINGER
DOI: 10.1007/s11103-020-00988-y

Keywords

DNA 6 mA; Sequence analysis; Feature encoding; Machine learning

Funding

  1. Grants-in-Aid for Scientific Research [19H04208] Funding Source: KAKEN

Ask authors/readers for more resources

Key message The existing prediction models are not suitable to identify 6mA in the Rosaceae genome because the existing algorithms are species-specific. Thus, a novel predictor is desired to be established to identify 6mA sites in the Rosaceae genome. To the best of our knowledge, we first propose a computation model named i6mA-Fuse (Identification of N6-MethylAdenine sites by Fusing multiple feature representation) to predict 6mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. DNA N-6-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron-ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of Rosa chinensis and Fragaria vesca, respectively. In the F. vesca-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the R. chinensis-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at .

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available