☆ 4.4 Article

Computational chemogenomics: Is it more than inductive transfer?

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN (2014)

Journal

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN

Volume 28, Issue 6, Pages 597-618

Publisher

SPRINGER

DOI: 10.1007/s10822-014-9743-1

Keywords

Chemogenomics; Proteochemometrics; QSAR; Machine learning; Inductive transfer

Funding

Japanese Society for the Promotion of Science [25870336]
Funding Program for Next Generation World-Leading Researchers
CREST program of the Japan Science and Technology Agency
Chugai Pharmaceutical Co. Ltd.
Mitsui Knowledge Co. Ltd.
Grants-in-Aid for Scientific Research [25870336] Funding Source: KAKEN

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

High-throughput assays challenge us to extract knowledge from multi-ligand, multi-target activity data. In QSAR, weights are statically fitted to each ligand descriptor with respect to a single endpoint or target. However, computational chemogenomics (CG) has demonstrated benefits of learning from entire grids of data at once, rather than building target-specific QSARs. A possible reason for this is the emergence of inductive knowledge transfer (IT) between targets, providing statistical robustness to the model, with no assumption about the structure of the targets. Relevant protein descriptors in CG should allow one to learn how to dynamically adjust ligand attribute weights with respect to protein structure. Hence, models built through explicit learning (EL) by including protein information, while benefitting from IT enhancement, should provide additional predictive capability, notably for protein deorphanization. This interplay between IT and EL in CG modeling is not sufficiently studied. While IT is likely to occur irrespective of the injected target information, it is not clear whether and when boosting due to EL may occur. EL is only possible if protein description is appropriate to the target set under investigation. The key issue here is the search for evidence of genuine EL exceeding expectations based on pure IT. We explore the problem in the context of Support Vector Regression, using more than 9,400 values of 31 GPCRs, where compound-protein interactions are represented by the concatenation of vectorial descriptions of compounds and proteins. This provides a unified framework to generate both IT-enhanced and potentially EL-enabled models, where the difference is toggled by supplied protein information. For EL-enabled models, protein information includes genuine protein descriptors such as typical sequence-based terms, but also the experimentally determined affinity cross-correlation fingerprints. These latter benchmark the expected behavior of a quasi-ideal descriptor capturing the actual functional protein-protein relatedness, and therefore thought to be the most likely to enable EL. EL- and IT-based methods were benchmarked alongside classical QSAR, with respect to cross-validation and deorphanization challenges. A rational method for projecting benchmarked methodologies into a strategy space is given, in the aims that the projection will provide directions for the types of molecule designs possible using a given methodology. While EL-enabled strategies outperform classical QSARs and favorably compare to similar published results, they are, in all respects evaluated herein, not strongly distinguished from IT-enhanced models. Moreover, EL-enabled strategies failed to prove superior in deorphanization challenges. Therefore, this paper raises caution that, contrary to common belief and intuitive expectation, the benefits of chemogenomics models over classical QSAR are quite possibly due less to the injection of protein-related information, and rather impacted more by the effect of inductive transfer, due to simultaneous learning from all of the modeled endpoints. These results show that the field of protein descriptor research needs further improvements to truly realize the expected benefit of EL.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Current advances in ligand-based target prediction

Su-Qing Yang, Qing Ye, Jun-Jie Ding, Ming-Zhu Yin, Ai-Ping Lu, Xiang Chen, Ting-Jun Hou, Dong-Sheng Cao

Summary: Target identification for bioactive molecules is crucial in modern drug discovery, with computational methods being proposed and widely developed to accelerate the validation process. Ligand-based target prediction methods have made significant progress in the past decade, offering flexibility, low computational cost, and remarkable predictive performance.

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE (2021)