☆ 4.7 Article

Expectation pooling: an effective and interpretable pooling method for predicting DNA-protein binding

BIOINFORMATICS (2020)

Journal

BIOINFORMATICS

Volume 36, Issue 5, Pages 1405-1412

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btz768

Keywords

Funding

National Key Research and Development Program of China [2016YFA0502303]
National Key Basic Research Project of China [2015CB910303]
National Natural Science Foundation of China [31871342]
National Key R&D Program of China [2016YFC0901603]
China 863 Program [2015AA020108]
Beijing Advanced Innovation Center for Genomics (ICG)
State Key Laboratory of Protein and Plant Gene Research, Peking University

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Convolutional neural networks (CNNs) have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. While previous studies have built a connection between CNNs and probabilistic models, simple models of CNNs cannot achieve sufficient accuracy on this problem. Recently, some methods of neural networks have increased performance using complex neural networks whose results cannot be directly interpreted. However, it is difficult to combine probabilistic models and CNNs effectively to improve DNA- protein binding predictions. Results: In this article, we present a novel global pooling method: expectation pooling for predicting DNA-protein binding. Our pooling method stems naturally from the expectation maximization algorithm, and its benefits can be interpreted both statistically and via deep learning theory. Through experiments, we demonstrate that our pooling method improves the prediction performance DNA-protein binding. Our interpretable pooling method combines probabilistic ideas with global pooling by taking the expectations of inputs without increasing the number of parameters. We also analyze the hyperparameters in our method and propose optional structures to help fit different datasets. We explore how to effectively utilize these novel pooling methods and show that combining statistical methods with deep learning is highly beneficial, which is promising and meaningful for future studies in this field.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Predicting protein-peptide binding residues via interpretable deep learning

Ruheng Wang, Junru Jin, Quan Zou, Kenta Nakai, Leyi Wei

Summary: In this study, we propose a BERT-based contrastive learning framework called PepBCL for predicting protein-peptide binding residues. This method eliminates the need for complex feature engineering by utilizing a well-pretrained protein language model to automatically extract and learn feature representations. Additionally, a contrastive learning module is used to optimize the feature representations of binding residues within the imbalanced dataset, resulting in improved performance. Experimental results demonstrate that our method outperforms existing techniques, and the integration of traditional features and learned features further enhances performance.

BIOINFORMATICS (2022)