4.6 Article

Learning from real imbalanced data of 14-3-3 proteins binding specificity

Journal

NEUROCOMPUTING
Volume 217, Issue -, Pages 83-91

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.neucom.2016.03.093

Keywords

Similarity-based undersampling; SMOTE-like oversampling; Locally weighted regression; Physicochemical property; Auto-cross covariance; 14-3-3 isoforms

Funding

  1. National Science Foundation of China [NSFC 61402326]
  2. Peiyang Scholar Program of Tianjin University [2016XRG-0009]
  3. Tianjin Research Program of Application Foundation and Advanced Technology [16JCQNJC00200]

Ask authors/readers for more resources

The 14-3-3 proteins are a highly conserved family of homodimeric and heterodimeric molecules, expressed in all eukaryotic cells. In human cells, this family consists of seven distinct but highly homologous 14-3-3 isoforms. 14-3-3 sigma is the only isoform directly linked to cancer in epithelial cells, which is regulated by major tumor suppressor gene. For each 14-3-3 isoform, we have 1000 peptide motifs with experimental binding affinity values. In this paper, we present a novel method for identifying peptide motifs binding to 14-3-3 sigma isoform. First, we select nine physicochemical properties of amino acids to describe each peptide motif. We also use auto-cross covariance to extract correlative properties of amino acids in any two positions. Then, a similarity-based undersampling approach and a SMOTE-like over sampling approach are used to deal with imbalanced distribution of the known peptide motifs. Finally, we consider locally weighted regression to predict affinity values of peptide motifs, which combines the simplicity of linear least squares regression with the flexibility of nonlinear regression. Our method tests on the 1000 peptide motifs binding to seven 14-3-3 isoforms. On the 14-3-3 sigma isoform, our method has overall Pearson-product-moment correlation coefficient (PCC) and the root mean squared error (RMSE) values of 0.83 and 258.31 for N-terminal sublibrary, and 0.80 and 250.89 for C-terminal sublibrary, respectively. We identify phosphopeptides that preferentially bind to 14-3-3 sigma over other isoforms. Several positions on peptide motifs have the same amino acid as experimental substrate specificity of phosphopeptides binding to 14-3-3 sigma. Our method is a fast and reliable computational method that can be used in peptide-protein binding identification in proteomics research. (C) 2016 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available