4.7 Article

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier

Journal

COMPUTERS IN BIOLOGY AND MEDICINE
Volume 123, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compbiomed.2020.103899

Keywords

Protein-protein interactions; Multi-information fusion; XGBoost; Stacked ensemble classifier

Funding

  1. National Nature Science Foundation of China [61863010]
  2. Key Research and Development Program of Shandong Province of China [2019GGX101001]
  3. Natural Science Foundation of Shandong Province of China [ZR2018MC007, ZR2019MEE066]

Ask authors/readers for more resources

Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Review Pharmacology & Pharmacy

Single-Cell Techniques and Deep Learning in Predicting Drug Response

Zhenyu Wu, Patrick J. Lawrence, Anjun Ma, Jian Zhu, Dong Xu, Qin Ma

TRENDS IN PHARMACOLOGICAL SCIENCES (2020)

Article Cardiac & Cardiovascular Systems

Use of machine learning to classify high-risk variants of uncertain significance in lamin A/C cardiac disease

Jeffrey S. Bennett, David M. Gordon, Uddalak Majumdar, Patrick J. Lawrence, Adrianna Matos-Nieves, Katherine Myers, Anna N. Kamp, Julie C. Leonard, Kim L. McBride, Peter White, Vidu Garg

Summary: This study used a machine learning approach to predict pathogenic LMNA variants and identified a novel LMNA variant associated with conduction system disease. The results suggest that machine learning methods can assist in identifying high-risk variants of uncertain significance.

HEART RHYTHM (2022)

Article Genetics & Heredity

Exome sequencing in multiplex families with left-sided cardiac defects has high yield for disease gene discovery

David M. Gordon, David Cunningham, Gloria Zender, Patrick J. Lawrence, Jacqueline S. Penaloza, Hui Lin, Sara M. Fitzgerald-Butt, Katherine Myers, Tiffany Duong, Donald J. Corsmeier, Jeffrey B. Gaither, Harkness C. Kuck, Saranga Wijeratne, Blythe Moreland, Benjamin J. Kelly, Vidu Garg, Peter White, Kim L. McBride

Summary: This study investigates the genetic causes of congenital heart disease by studying families with multiple individuals affected by heart defects. By identifying potential disease-causing genetic variants that are common among all affected individuals, the study was able to find plausible disease-causing variants in several genes and identify new genes that may contribute to the presence of a heart defect. The findings suggest that studying families may be more effective in finding causes of heart defects than studying individuals, and that changes in multiple genes may be required for a heart defect to occur.

PLOS GENETICS (2022)

Article Biology

Multimodal pre-screening can predict BCI performance variability: A novel subject-specific experimental scheme

Seyyed Bahram Borgheai, Alyssa Hillary Zisk, John McLinden, James Mcintyre, Reza Sadjadi, Yalda Shahriari

Summary: This study proposed a novel personalized scheme using fNIRS and EEG as the main tools to predict and compensate for the variability in BCI systems, especially for individuals with severe motor deficits. By establishing predictive models, it was found that there were significant associations between the predicted performances and the actual performances.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Exploring a novel HE image segmentation technique for glioblastoma: A hybrid slime mould and differential evolution approach

Hongliang Guo, Hanbo Liu, Ahong Zhu, Mingyang Li, Helong Yu, Yun Zhu, Xiaoxiao Chen, Yujia Xu, Lianxing Gao, Qiongying Zhang, Yangping Shentu

Summary: In this paper, a BDSMA-based image segmentation method is proposed, which improves the limitations of the original algorithm by combining SMA with DE and introducing a cooperative mixing model. The experimental results demonstrate the superiority of this method in terms of convergence speed and precision compared to other methods, and its successful application to brain tumor medical images.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Semi-supervised point consistency network for retinal artery/vein classification

Jingfei Hu, Linwei Qiu, Hua Wang, Jicong Zhang

Summary: This study proposes a novel semi-supervised point consistency network (SPC-Net) for retinal artery/vein (A/V) classification, addressing the challenges of specific tubular structures and limited well-labeled data in CNN-based approaches. The SPC-Net combines an AVC module and an MPC module, and introduces point set representations and consistency regularization to improve the accuracy of A/V classification.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

ConTraNet: A hybrid network for improving the classification of EEG and EMG signals with limited training data

Omair Ali, Muhammad Saif-ur-Rehman, Tobias Glasmachers, Ioannis Iossifidis, Christian Klaes

Summary: This study introduces a novel hybrid model called ConTraNet, which combines the strengths of CNN and Transformer neural networks, and achieves significant improvement in classification performance with limited training data.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

A novel mobile phone and tablet application for automatized calculation of pain extent

Juan Antonio Valera-Calero, Dario Lopez-Zanoni, Sandra Sanchez-Jorge, Cesar Fernandez-de-las-Penas, Marcos Jose Navarro-Santana, Sofia Olivia Calvo-Moreno, Gustavo Plaza-Manzano

Summary: This study developed an easy-to-use application for assessing the diagnostic accuracy of digital pain drawings (PDs) compared to the classic paper-and-pencil method. The results demonstrated that digital PDs have higher reliability and accuracy compared to paper-and-pencil PDs, and there were no significant differences in assessing pain extent between the two methods. The PAIN EXTENT app showed good convergent validity.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Radial magnetic resonance image reconstruction with a deep unrolled projected fast iterative soft-thresholding network

Biao Qu, Jialue Zhang, Taishan Kang, Jianzhong Lin, Meijin Lin, Huajun She, Qingxia Wu, Meiyun Wang, Gaofeng Zheng

Summary: This study proposes a deep unrolled neural network, pFISTA-DR, for radial MRI image reconstruction, which successfully preserves image details using a preprocessing module, learnable convolution filters, and adaptive threshold.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Improving mixed-integer temporal modeling by generating synthetic data using conditional generative adversarial networks: A case study of fluid overload prediction in the intensive care unit

Alireza Rafiei, Milad Ghiasi Rad, Andrea Sikora, Rishikesan Kamaleswaran

Summary: This study aimed to improve machine learning model prediction of fluid overload by integrating synthetic data, which could be translated to other clinical outcomes.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Densely connected convolutional networks for ultrasound image based lesion segmentation

Jinlian Ma, Dexing Kong, Fa Wu, Lingyun Bao, Jing Yuan, Yusheng Liu

Summary: In this study, a new method based on MDenseNet is proposed for automatic segmentation of nodular lesions from ultrasound images. Experimental results demonstrate that the proposed method can accurately extract multiple nodules from thyroid and breast ultrasound images, with good accuracy and reproducibility, and it shows great potential in other clinical segmentation tasks.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Multi-omics fusion with soft labeling for enhanced prediction of distant metastasis in nasopharyngeal carcinoma patients after radiotherapy

Jiabao Sheng, SaiKit Lam, Jiang Zhang, Yuanpeng Zhang, Jing Cai

Summary: Omics fusion is an important preprocessing approach in medical image processing that assists in various studies. This study aims to develop a fusion methodology for predicting distant metastasis in nasopharyngeal carcinoma by mitigating the disparities in omics data and utilizing a label-softening technique and a multi-kernel-based neural network.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Regularity and variability of functional brain connectivity characteristics between gyri and sulci under naturalistic stimulus

Zhenxiang Xiao, Liang He, Boyu Zhao, Mingxin Jiang, Wei Mao, Yuzhong Chen, Tuo Zhang, Xintao Hu, Tianming Liu, Xi Jiang

Summary: This study systematically investigates the functional connectivity characteristics between gyri and sulci in the human brain under naturalistic stimulus, and identifies unique features in these connections. This research provides novel insights into the functional brain mechanism under naturalistic stimulus and lays a solid foundation for accurately mapping the brain anatomy-function relationship.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Unraveling the allosteric inhibition mechanism of PARP-1 CAT and the D766/770A mutation effects via Gaussian accelerated molecular dynamics and Markov state model

Qianqian Wang, Mingyu Zhang, Aohan Li, Xiaojun Yao, Yingqing Chen

Summary: The development of PARP-1 inhibitors is crucial for the treatment of various cancers. This study investigates the structural regulation of PARP-1 by different allosteric inhibitors, revealing the basis of allosteric inhibition and providing guidance for the discovery of more innovative PARP-1 inhibitors.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

DualAttNet: Synergistic fusion of image-level and fine-grained disease attention for multi-label lesion detection in chest X-rays

Qing Xu, Wenting Duan

Summary: In this paper, a dual attention supervised module, named DualAttNet, is proposed for multi-label lesion detection in chest radiographs. By efficiently fusing global and local lesion classification information, the module is able to recognize targets with different sizes. Experimental results show that DualAttNet outperforms baselines in terms of mAP and AP50 with different detection architectures.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Searching for significant reactions and subprocesses in models of biological systems based on Petri nets

Kaja Gutowska, Piotr Formanowicz

Summary: The primary aim of this research is to propose algorithms for identifying significant reactions and subprocesses within biological system models constructed using classical Petri nets. These solutions enable two analysis methods: importance analysis for identifying critical individual reactions to the model's functionality and occurrence analysis for finding essential subprocesses. The utility of these methods has been demonstrated through analyses of an example model related to the DNA damage response mechanism. It should be noted that these proposed analyses can be applied to any biological phenomenon represented using the Petri net formalism. The presented analysis methods extend classical Petri net-based analyses, enhancing our comprehension of the investigated biological phenomena and aiding in the identification of potential molecular targets for drugs.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

LDP-GAN : Generative adversarial networks with local differential privacy for patient medical records synthesis

Hansle Gwon, Imjin Ahn, Yunha Kim, Hee Jun Kang, Hyeram Seo, Heejung Choi, Ha Na Cho, Minkyoung Kim, Jiye Han, Gaeun Kee, Seohyun Park, Kye Hwa Lee, Tae Joon Jun, Young-Hak Kim

Summary: Electronic medical records have potential in advancing healthcare technologies, but privacy issues hinder their full utilization. Deep learning-based generative models can mitigate this problem by creating synthetic data similar to real patient data. However, the risk of data leakage due to malicious attacks poses a challenge to traditional generative models. To address this, we propose a method that employs local differential privacy (LDP) to protect the model from attacks and preserve the privacy of training data, while generating medical data with reasonable performance.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)

Article Biology

Phase retrieval for X-ray differential phase contrast radiography with knowledge transfer learning from virtual differential absorption model

Siwei Tao, Zonghan Tian, Ling Bai, Yueshu Xu, Cuifang Kuang, Xu Liu

Summary: This study proposes a transfer learning-based method to address the phase retrieval problem in grating-based X-ray phase contrast imaging. By generating a training dataset and using deep learning techniques, this method improves image quality and can be applied to X-ray 2D and 3D imaging.

COMPUTERS IN BIOLOGY AND MEDICINE (2024)