Journal
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
Volume 89, Issue 4, Pages 416-426Publisher
WILEY
DOI: 10.1002/prot.26027
Keywords
area under the curve; conformational sampling; cryptic site; elastic network model; ligand binding; logistic regression; machine learning; neural net; normal mode analysis; random forest; receiver operating characteristic curve; SARS‐ CoV‐ 2
Categories
Funding
- American Heart Association [17GRNT33690009]
Ask authors/readers for more resources
A fast and simple conformational sampling scheme guided by normal modes can accurately predict cryptic sites for small molecules binding in target proteins. Sampling along each of the lowest 30 modes is near optimal for adequately restructuring cryptic sites to be detected by existing pocket finding programs. The method achieves high prediction accuracy comparable to existing servers but is much faster and simpler, making it suitable for high-throughput processing of large datasets of protein structures at the genome scale.
To greatly expand the druggable genome, fast and accurate predictions of cryptic sites for small molecules binding in target proteins are in high demand. In this study, we have developed a fast and simple conformational sampling scheme guided by normal modes solved from the coarse-grained elastic models followed by atomistic backbone refinement and side-chain repacking. Despite the observations of complex and diverse conformational changes associated with ligand binding, we found that simply sampling along each of the lowest 30 modes is near optimal for adequately restructuring cryptic sites so they can be detected by existing pocket finding programs like fpocket and concavity. We further trained machine-learning protocols to optimize the combination of the sampling-enhanced pocket scores with other dynamic and conservation scores, which only slightly improved the performance. As assessed based on a training set of 84 known cryptic sites and a test set of 14 proteins, our method achieved high accuracy of prediction (with area under the receiver operating characteristic curve >0.8) comparable to the CryptoSite server. Compared with CryptoSite and other methods based on extensive molecular dynamics simulation, our method is much faster (1-2 hours for an average-size protein) and simpler (using only pocket scores), so it is suitable for high-throughput processing of large datasets of protein structures at the genome scale.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available