4.7 Article

i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome

期刊

GENOMICS
卷 113, 期 1, 页码 582-592

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ygeno.2020.09.054

关键词

Sequence analysis; DNA N6-methyladenine; Machine learning; RFECV; Stacking

资金

  1. National Research Foundation of Korea(NRF) - Korea government (MSIT) [2020R1A2C2005612]
  2. Brain Research Program of the National Research Foundation (NRF) - Korean government (MSIT) [NRF-2017M3C7A1044816]
  3. Basic Science Research Program through the National Research Foundation of Korea (NRF) - Ministry of Education [2019R1A6A3A01094685]
  4. National Research Foundation of Korea [2019R1A6A3A01094685, 2020R1A2C2005612] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

This study proposes a machine learning technique to identify DNA N6-methyladenine (6 mA) sites in Rosa chinensis and Fragaria vesca. By using recursive feature elimination with cross-validation strategy to extract optimal feature subset from five different DNA sequence encoding schemes, a double layers of machine learning-based stacking model was trained to create a bioinformatics tool named 'i6mA-stack'.
DNA N6-methyladenine (6 mA) is an epigenetic modification that plays a vital role in a variety of cellular processes in both eukaryotes and prokaryotes. Accurate information of 6 mA sites in the Rosaceae genome may assist in understanding genomic 6 mA distributions and various biological functions such as epigenetic inheritance. Various studies have shown the possibility of identifying 6 mA sites through experiments, but the procedures are time-consuming and costly. To overcome the drawbacks of experimental methods, we propose an accurate computational paradigm based on a machine learning (ML) technique to identify 6 mA sites in Rosa chinensis (R.chinensis) and Fragaria vesca (F.vesca). To improve the performance of the proposed model and to avoid overfitting, a recursive feature elimination with cross-validation (RFECV) strategy is used to extract the optimal number of features (ONF) subset from five different DNA sequence encoding schemes, i.e., Binary Encoding (BE), Ring-Function-Hydrogen-Chemical Properties (RFHC), Electron-Ion-Interaction Pseudo Potentials of Nucleotides (EIIP), Dinucleotide Physicochemical Properties (DPCP), and Trinucleotide Physicochemical Properties (TPCP). Subsequently, we use the ONF subset to train a double layers of ML-based stacking model to create a bioinformatics tool named 'i6mA-stack'.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network

Jhabindra Khanal, Hilal Tayara, Quan Zou, Kil To Chong

Summary: In this study, a new deep learning model, DeepCap-Kcr, was proposed for accurate prediction of Kcr sites in proteins. The model outperformed existing methods and could learn internal hierarchical representations and important features from a small number of samples.

BRIEFINGS IN BIOINFORMATICS (2022)

Article Chemistry, Multidisciplinary

iEnhancer-Deep: A Computational Predictor for Enhancer Sites and Their Strength Using Deep Learning

Haider Kamran, Muhammad Tahir, Hilal Tayara, Kil To Chong

Summary: Enhancers are short motifs with high position variability and play an important role in gene regulation. Identification of enhancers is challenging due to their complexity, but recent advancements in computational tools and deep learning frameworks have shown comparable results with state-of-the-art methodologies.

APPLIED SCIENCES-BASEL (2022)

Article Biochemistry & Molecular Biology

XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity

Keerthana Jaganathan, Mobeen Ur Rehman, Hilal Tayara, Kil To Chong

Summary: In this study, an explainable machine-learning model was proposed to classify compounds with mitochondrial toxicity and non-toxicity. After experiments, the model achieved high prediction accuracy and showed significant improvement compared to existing methods.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Review Biochemistry & Molecular Biology

Recent Studies of Artificial Intelligence on In Silico Drug Distribution Prediction

Thi Tuyet Van Tran, Hilal Tayara, Kil To Chong

Summary: Drug distribution is a crucial process in pharmacokinetics, as it affects the effectiveness and safety of the drug. Lack of efficacy and uncontrollable toxicity are the major causes of drug failures. Advances in drug distribution property prediction, particularly through in silico methods, have reduced screening time and costs. This study provides comprehensive knowledge on drug distribution, including influencing factors and artificial intelligence-based prediction models. The review also presents future challenges and research directions, aiming to facilitate innovative approaches in drug discovery.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2023)

Article Chemistry, Multidisciplinary

Attention-Based Graph Neural Network for Molecular Solubility Prediction

Waciar Ahmad, Hilal Tayara, Kil To Chong

Summary: Drug discovery (DD) research aims to discover new medications. Solubility is an important property in drug development. Aqueous solubility (AS) is a key attribute required for API characterization. In this study, deep learning models were created to predict the solubility of a wide range of molecules using the largest currently available solubility data set. The models were trained and tested on 9943 compounds, with the AttentiveFP-based network model outperforming on 62 anticancer compounds.

ACS OMEGA (2023)

Article Biochemical Research Methods

Sars-escape network for escape prediction of SARS-COV-2

Prem Singh Bist, Hilal Tayara, Kil To Chong

Summary: We developed a computational model that accurately identifies viral escape mutational sequences based on natural language processing and prior knowledge of experimentally validated escape mutants. This model can be applied to other viruses using knowledge of escape mutants and protein sequence datasets.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemical Research Methods

DL-m6A: Identification of N6-Methyladenosine Sites in Mammals Using Deep Learning Based on Different Encoding Schemes

Mobeen Ur Rehman, Hilal Tayara, Kil To Chong

Summary: In this study, a novel tool called DL-m6A is proposed for the identification of m6A sites in mammals using deep learning based on different encoding schemes. The tool utilizes three encoding schemes to provide contextual feature representation to the input RNA sequence. The results demonstrate that the proposed tool outperforms existing tools and can be of great use for biology experts.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Review Chemistry, Medicinal

Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives

Thi Tuyet Van Tran, Agung Surya Wibowo, Hilal Tayara, Kil To Chong

Summary: Toxicity prediction in drug discovery is crucial for identifying safe and effective compounds, reducing late-stage failures. Artificial intelligence has shown promise in improving drug toxicity prediction through accurate and efficient methods. This review provides an overview of recent advances in AI-based drug toxicity prediction and highlights challenges and future perspectives, aiding researchers in understanding toxicity prediction and advancing drug discovery methods.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Review Pharmacology & Pharmacy

Artificial Intelligence in Drug Metabolism and Excretion Prediction: Recent Advances, Challenges, and Future Perspectives

Thi Tuyet Van Tran, Hilal Tayara, Kil To Chong

Summary: Drug metabolism and excretion are crucial in determining drug efficacy and safety. Artificial intelligence (AI) has emerged as a powerful tool for predicting these processes, offering potential for faster drug development and improved success rates. This review highlights recent advancements in AI-based prediction of drug metabolism and excretion, including deep learning and machine learning algorithms. It also provides a list of public data sources and prediction tools, discusses challenges in AI model development, and explores future perspectives in the field.

PHARMACEUTICS (2023)

Article Biochemical Research Methods

ORI-Explorer: a unified cell-specific tool for origin of replication sites prediction by feature fusion

Zeeshan Abbas, Mobeen Ur Rehman, Hilal Tayara, Kil To Chong

Summary: In this article, a unique artificial intelligence-based technique called ORI-Explorer is developed to recognize origins of replication sites (ORIs) in four different eukaryotic species. ORI-Explorer combines multiple feature engineering techniques and utilizes the CatBoost Classifier. It outperforms existing predictors and provides key insights into model success through the SHapley Additive exPlanation method. ORI-Explorer aims to aid community-wide efforts in discovering potential ORIs and developing verifiable biological hypotheses.

BIOINFORMATICS (2023)

Article Biology

An ensemble of stacking classifiers for improved prediction of miRNA-mRNA interactions

Priyash Dhakal, Hilal Tayara, Kil To Chong

Summary: We developed a stacking classifier algorithm that surpasses previous algorithms in predicting functional miRNA targets by effectively selecting conservative candidate target sites using feature encoding techniques.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Review Chemistry, Medicinal

Recent Studies of Artificial Intelligence on In Silico Drug Absorption

Thi Tuyet Van Tran, Hilal Tayara, Kil To Chong

Summary: Drug absorption is a crucial aspect in pharmaceutical research and development, and its prediction using in silico methods, particularly artificial intelligence, has shown promising results in reducing time and cost for screening drug candidates. This report provides an overview of recent studies on predicting absorption properties and highlights challenges and future directions in this field.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Article Biochemical Research Methods

iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data

Sehi Park, Mobeen Ur Rehman, Farman Ullah, Hilal Tayara, Kil To Chong, Inanc Birol

Summary: The study developed positional features for predicting CpG site methylation patterns, using optimized classifiers and ensemble learning approaches. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.

BIOINFORMATICS (2023)

Article Biochemistry & Molecular Biology

Identification of piRNA disease associations using deep learning

Syed Danish Ali, Hilal Tayara, Kil To Chong

Summary: piRNAs play a crucial role in maintaining genome integrity, and piRDA is an effective deep learning method for identifying piRNA-disease associations, facilitating drug development.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2022)

Article Biochemical Research Methods

Identification of Functional piRNAs Using a Convolutional Neural Network

Syed Danish Ali, Waleed Alam, Hilal Tayara, Kil To Chong

Summary: piRNAs are a class of small RNAs that play important roles in maintaining germline cells, gene stability, and genome integrity, and are associated with various cancers. A predictor based on a deep learning architecture has been proposed, showing significant improvements in piRNA prediction and target mRNA deadenylation compared to existing computational methods.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Biotechnology & Applied Microbiology

MetaAc4C: A multi-module deep learning framework for accurate prediction of N4-acetylcytidine sites based on pre-trained bidirectional encoder representation and generative adversarial networks

Zutan Li, Bingbing Jin, Jingya Fang

Summary: In this study, we propose MetaAc4C, an advanced deep learning model for accurate identification of N4-acetylcytidine (ac4C) sites using pre-trained BERT and various optimization techniques. By adapting generative adversarial networks to address data imbalance and augmenting training RNA samples, our model outperforms existing methods in terms of ACC, MCC, and AUROC.

GENOMICS (2024)