4.7 Article

NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning

期刊

BRIEFINGS IN BIOINFORMATICS
卷 22, 期 6, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbab167

关键词

neuropeptide; feature representation learning; two-step feature selection; machine learning; cross-validation

资金

  1. Japan Society for the Promotion of Science (JSPS) [19H04208, 19F19377]
  2. National Research Foundation of Korea (NRF) - Korean government (MSIT) [2021R1A2C1014338]
  3. Grants-in-Aid for Scientific Research [19F19377] Funding Source: KAKEN
  4. National Research Foundation of Korea [2021R1A2C1014338] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

Neuropeptides play a crucial role in regulating immune systems, and rapid and accurate identification of them is essential for basic research and drug development. Using machine learning and feature representation learning, the developed NeuroPred-FRL predictor demonstrates superior prediction performance, serving as a powerful tool for large-scale identification of neuropeptides.
Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive explanation algorithm.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

SiameseCPP: a sequence-based Siamese network to predict cell -penetrating peptides by contrastive learning

Xin Zhang, Lesong Wei, Xiucai Ye, Kai Zhang, Saisai Teng, Zhongshen Li, Junru Jin, Minjae Kim, Tetsuya Sakurai, Lizhen Cui, Balachandran Manavalan, Leyi Wei

Summary: A novel deep learning framework SiameseCPP is proposed for automated prediction of cell-penetrating peptides (CPPs). SiameseCPP learns discriminative representations of CPPs based on a well-pretrained model and a Siamese neural network comprising a transformer and gated recurrent units. Comprehensive experiments demonstrate that SiameseCPP outperforms existing baseline models for CPP prediction and exhibits satisfactory generalization ability on other functional peptide datasets.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biology

PSRTTCA: A new approach for improving the prediction and characterization of tumor T cell antigens using propensity score representation learning

Phasit Charoenkwan, Chonlatip Pipattanaboon, Chanin Nantasenamat, Md Mehedi Hasan, Mohammad Ali Moni, Pietro Lio, Watshara Shoombuatong

Summary: Despite existing cancer therapies, the development of new and effective treatments is necessary to address the ongoing cancer recurrence and new cases. This study proposes a new machine learning-based approach, PSRTTCA, for improving the identification and characterization of tumor T cell antigens (TTCAs) based on their primary sequences.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Biochemistry & Molecular Biology

GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features

Adeel Malik, Watshara Shoombuatong, Chang-Bae Kim, Balachandran Manavalan

Summary: A machine learning-based predictor called GPApred was developed to identify LPXTG-like proteins from their primary sequences. This predictor can be utilized for functional characterization and drug targeting in further research.

INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES (2023)

Article Biology

Computational prediction of protein folding rate using structural parameters and network centrality measures

Saraswathy Nithiyanandam, Vinoth Kumar Sangaraju, Balachandran Manavalan, Gwang Lee

Summary: Protein folding is a complex process where a polymer of amino acids transitions from an unfolded state to a unique three-dimensional structure. Previous studies have identified structural parameters and examined their relationship with protein folding rate, but these parameters are only applicable to a limited set of proteins. Machine learning models have been proposed, but they fail to explain plausible folding mechanisms. In this study, ten different machine learning algorithms were evaluated using various structural parameters and network centrality measures, with support vector machine showing the best predictive capability.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Biology

PSRQSP: An effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning

Phasit Charoenkwan, Pramote Chumnanpuen, Nalini Schaduangrat, Changmin Oh, Balachandran Manavalan, Watshara Shoombuatong

Summary: In this study, a novel computational approach called PSRQSP was developed to improve the prediction and analysis of QSPs. Experimental results showed that PSRQSP outperformed conventional methods in identifying QSPs and demonstrated its predictive capability and effectiveness. PSRQSP also constructed an easy-to-use web server for accelerating the discovery of potential QSPs for drug development.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Biochemistry & Molecular Biology

Pretoria: An effective computational approach for accurate and high-throughput identification of CD8+t-cell epitopes of eukaryotic pathogens

Phasit Charoenkwan, Nalini Schaduangrat, Nhat Truong Pham, Balachandran Manavalan, Watshara Shoombuatong

Summary: Proposed the first stack-based approach, Pretoria, for accurate and large-scale identification of CD8+ T-cell epitopes (TCEs) of eukaryotic pathogens. Constructed a pool of 144 different machine learning (ML)-based classifiers based on 12 popular ML algorithms and used feature selection method to determine important ML classifiers for building the stacked model. Experimental results demonstrated that Pretoria outperformed several conventional ML classifiers and the existing method, with an accuracy of 0.866, MCC of 0.732, and AUC of 0.921 in the independent test.

INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES (2023)

Article Biochemistry & Molecular Biology

PRR-HyPred: A two-layer hybrid framework to predict pattern recognition receptors and their families by employing sequence encoded optimal features

Ahmad Firoz, Adeel Malik, Hani Mohammed Ali, Yusuf Akhter, Balachandran Manavalan, Chang-Bae Kim

Summary: In this study, a new two-layer hybrid framework called PRR-HyPred was constructed to simultaneously predict and classify PRRs. Using support vector machine and random forest-based classifier, PRR-HyPred achieved accuracies of 83.4% and 95% in the first and second layers respectively. This is the first study that can predict and classify PRRs into specific families, and it can be a valuable tool for large-scale PRR prediction and classification, facilitating future studies.

INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES (2023)

Article Biochemistry & Molecular Biology

Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT

Kazuhiro Maeda, Hiroyuki Kurata

Summary: This article presents a new approach called KinModGPT that generates kinetic models directly from natural language text. KinModGPT utilizes GPT as a natural language interpreter and Tellurium as an SBML generator. The effectiveness of KinModGPT in creating SBML kinetic models from complex natural language descriptions is demonstrated, including metabolic pathways, protein-protein interaction networks, and heat shock response. This article showcases the potential of KinModGPT in kinetic modeling automation.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2023)

Article Computer Science, Artificial Intelligence

MonkeyNet: A robust deep convolutional neural network for monkeypox disease detection and classification

Diponkor Bala, Md. Shamim Hossain, Mohammad Alamgir Hossain, Md. Ibrahim Abdullah, Md. Mizanur Rahman, Balachandran Manavalan, Naijie Gu, Mohammad S. Islam, Zhangjin Huang

Summary: The monkeypox virus poses a new pandemic threat. However, there is currently no reliable monkeypox database available for training and testing deep learning models. The MSID dataset has been developed for this purpose, providing a collection of monkeypox patient images for building confident deep learning models. The proposed MonkeyNet model can accurately identify monkeypox disease and assist doctors in making early diagnoses.

NEURAL NETWORKS (2023)

Review Biochemical Research Methods

A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome

Le Thi Phan, Changmin Oh, Tao He, Balachandran Manavalan

Summary: Enhancers are non-coding DNA elements that enhance the transcription rate of specific genes. Computational platforms have been developed to complement experimental methods in identifying enhancers. This review provides an overview of machine learning-based prediction methods and databases for enhancer identification and discusses the advantages and drawbacks of these methods, as well as guidelines for developing more efficient enhancer predictors.

PROTEOMICS (2023)

Article Computer Science, Artificial Intelligence

Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition

Nhat Truong Pham, Duc Ngoc Minh Dang, Ngoc Duy Nguyen, Thanh Thi Nguyen, Hai Nguyen, Balachandran Manavalan, Chee Peng Lim, Sy Dzung Nguyen

Summary: This paper proposes a deep learning framework for speech emotion recognition, which combines a hybrid data augmentation method and deep attention-based dilated convolutional-recurrent neural networks. The framework is able to extract high-level representations from three-dimensional log Mel spectrogram features. Experimental results show that the proposed framework outperforms other state-of-the-art methods on the EmoDB and ERC datasets.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Biology

Unveiling local and global conformational changes and allosteric communications in SOD1 systems using molecular dynamics simulation and network analyses

Shaherin Basith, Balachandran Manavalan, Gwang Lee

Summary: This study combined microsecond-scale unbiased molecular dynamics simulation with network analysis to elucidate the local and global conformational changes and allosteric communications in SOD1 systems. Structural analyses revealed significant variations in catalytic sites and stability due to unmetallated SOD1 systems and cysteine mutations. Dynamic motion analysis showed balanced atomic displacement and highly correlated motions in the Holo system.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Multidisciplinary Sciences

TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus

Phasit Charoenkwan, Sajee Waramit, Pramote Chumnanpuen, Nalini Schaduangrat, Watshara Shoombuatong

Summary: HCV infection causes chronic liver diseases, and there is no effective vaccine available. This study proposes a novel approach called TROLLOPE to accurately identify TCE-HCVs from sequence information, with superior predictive performance.

PLOS ONE (2023)

暂无数据