4.3 Article

Predicting cryptic ligand binding sites based on normal modes guided conformational sampling

Journal

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
Volume 89, Issue 4, Pages 416-426

Publisher

WILEY
DOI: 10.1002/prot.26027

Keywords

area under the curve; conformational sampling; cryptic site; elastic network model; ligand binding; logistic regression; machine learning; neural net; normal mode analysis; random forest; receiver operating characteristic curve; SARS‐ CoV‐ 2

Funding

  1. American Heart Association [17GRNT33690009]

Ask authors/readers for more resources

A fast and simple conformational sampling scheme guided by normal modes can accurately predict cryptic sites for small molecules binding in target proteins. Sampling along each of the lowest 30 modes is near optimal for adequately restructuring cryptic sites to be detected by existing pocket finding programs. The method achieves high prediction accuracy comparable to existing servers but is much faster and simpler, making it suitable for high-throughput processing of large datasets of protein structures at the genome scale.
To greatly expand the druggable genome, fast and accurate predictions of cryptic sites for small molecules binding in target proteins are in high demand. In this study, we have developed a fast and simple conformational sampling scheme guided by normal modes solved from the coarse-grained elastic models followed by atomistic backbone refinement and side-chain repacking. Despite the observations of complex and diverse conformational changes associated with ligand binding, we found that simply sampling along each of the lowest 30 modes is near optimal for adequately restructuring cryptic sites so they can be detected by existing pocket finding programs like fpocket and concavity. We further trained machine-learning protocols to optimize the combination of the sampling-enhanced pocket scores with other dynamic and conservation scores, which only slightly improved the performance. As assessed based on a training set of 84 known cryptic sites and a test set of 14 proteins, our method achieved high accuracy of prediction (with area under the receiver operating characteristic curve >0.8) comparable to the CryptoSite server. Compared with CryptoSite and other methods based on extensive molecular dynamics simulation, our method is much faster (1-2 hours for an average-size protein) and simpler (using only pocket scores), so it is suitable for high-throughput processing of large datasets of protein structures at the genome scale.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available