4.4 Article

Computational chemogenomics: Is it more than inductive transfer?

Journal

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN
Volume 28, Issue 6, Pages 597-618

Publisher

SPRINGER
DOI: 10.1007/s10822-014-9743-1

Keywords

Chemogenomics; Proteochemometrics; QSAR; Machine learning; Inductive transfer

Funding

  1. Japanese Society for the Promotion of Science [25870336]
  2. Funding Program for Next Generation World-Leading Researchers
  3. CREST program of the Japan Science and Technology Agency
  4. Chugai Pharmaceutical Co. Ltd.
  5. Mitsui Knowledge Co. Ltd.
  6. Grants-in-Aid for Scientific Research [25870336] Funding Source: KAKEN

Ask authors/readers for more resources

High-throughput assays challenge us to extract knowledge from multi-ligand, multi-target activity data. In QSAR, weights are statically fitted to each ligand descriptor with respect to a single endpoint or target. However, computational chemogenomics (CG) has demonstrated benefits of learning from entire grids of data at once, rather than building target-specific QSARs. A possible reason for this is the emergence of inductive knowledge transfer (IT) between targets, providing statistical robustness to the model, with no assumption about the structure of the targets. Relevant protein descriptors in CG should allow one to learn how to dynamically adjust ligand attribute weights with respect to protein structure. Hence, models built through explicit learning (EL) by including protein information, while benefitting from IT enhancement, should provide additional predictive capability, notably for protein deorphanization. This interplay between IT and EL in CG modeling is not sufficiently studied. While IT is likely to occur irrespective of the injected target information, it is not clear whether and when boosting due to EL may occur. EL is only possible if protein description is appropriate to the target set under investigation. The key issue here is the search for evidence of genuine EL exceeding expectations based on pure IT. We explore the problem in the context of Support Vector Regression, using more than 9,400 values of 31 GPCRs, where compound-protein interactions are represented by the concatenation of vectorial descriptions of compounds and proteins. This provides a unified framework to generate both IT-enhanced and potentially EL-enabled models, where the difference is toggled by supplied protein information. For EL-enabled models, protein information includes genuine protein descriptors such as typical sequence-based terms, but also the experimentally determined affinity cross-correlation fingerprints. These latter benchmark the expected behavior of a quasi-ideal descriptor capturing the actual functional protein-protein relatedness, and therefore thought to be the most likely to enable EL. EL- and IT-based methods were benchmarked alongside classical QSAR, with respect to cross-validation and deorphanization challenges. A rational method for projecting benchmarked methodologies into a strategy space is given, in the aims that the projection will provide directions for the types of molecule designs possible using a given methodology. While EL-enabled strategies outperform classical QSARs and favorably compare to similar published results, they are, in all respects evaluated herein, not strongly distinguished from IT-enhanced models. Moreover, EL-enabled strategies failed to prove superior in deorphanization challenges. Therefore, this paper raises caution that, contrary to common belief and intuitive expectation, the benefits of chemogenomics models over classical QSAR are quite possibly due less to the injection of protein-related information, and rather impacted more by the effect of inductive transfer, due to simultaneous learning from all of the modeled endpoints. These results show that the field of protein descriptor research needs further improvements to truly realize the expected benefit of EL.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemical Research Methods

HIV-1 drug resistance profiling using amino acid sequence space cartography

Karina Pikalyova, Alexey Orlov, Arkadii Lin, Olga Tarasova, Marcou Gilles, Dragos Horvath, Vladimir Poroikov, Alexandre Varnek

Summary: A new methodology based on generative topographic mapping (GTM) was introduced for predicting the drug resistance of HIV strains. The approach combines high accuracy and interpretability, allowing for visualization and analysis of sequence space and treatment optimization. Several case studies demonstrate the practicality of this method.

BIOINFORMATICS (2022)

Article Chemistry, Medicinal

SynthI: A New Open-Source Tool for Synthon-Based Library Design

Yuliana Zabolotna, Dmitriy M. Volochnyuk, Sergey Ryabukhin, Kostiantyn Gavrylenko, Dragos Horvath, Olga Klimchuk, Oleksandr Oksiuta, Gilles Marcou, Alexandre Varnek

Summary: Most existing computational tools for de novo library design focus on generating, selecting, and combining structural motifs to form new library members. However, these approaches appear to be more theoretical and disconnected from reality due to the lack of a direct link between the chemical space of the retrosynthesized fragments and the pool of available reagents. This paper presents a new open-source toolkit called Synthons Interpreter (SynthI), which merges these two chemical spaces into a single synthons space.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Chemistry, Medicinal

A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry

Yuliana Zabolotna, Dmitriy M. Volochnyuk, Sergey V. Ryabukhin, Dragos Horvath, Konstantin S. Gavrilenko, Gilles Marcou, Yurii S. Moroz, Oleksandr Oksiuta, Alexandre Varnek

Summary: Efficient synthesis of desired compounds is crucial for chemical space exploration in drug discovery, which is influenced by both established synthetic protocols and the availability of corresponding building blocks (BBs). This study analyzes the chemical space of 400,000 purchasable BBs, examining their physicochemical properties and diversity to assess their coverage of medicinal chemistry needs. The analysis is based on a universal topographic map that visualizes libraries and their differences in coverage.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Chemistry, Medicinal

Exploration of the Chemical Space of DNA-encoded Libraries

Regina Pikalyova, Yuliana Zabolotna, Dmitriy M. Volochnyuk, Dragos Horvath, Gilles Marcou, Alexandre Varnek

Summary: DNA-Encoded Library (DEL) technology is a method for discovering bioactive molecules in medicinal chemistry. This project aimed to generate and analyze an ultra-large chemical space of DEL using commercially available building blocks. The study compared the DEL compounds to biologically relevant compounds from ChEMBL and identified optimal DELs covering the chemical space of ChEMBL. Different combinations of DELs were analyzed to achieve even higher coverage of ChEMBL than with a single DEL.

MOLECULAR INFORMATICS (2022)

Article Chemistry, Multidisciplinary

Maytansinol Functionalization: Towards Useful Probes for Studying Microtubule Dynamics

Zlata Boiarska, Helena Perez-Pena, Anne-Catherine Abel, Paola Marzullo, Beatriz Alvarez-Bernad, Francesca Bonato, Benedetta Santini, Dragos Horvath, Daniel Lucena-Agell, Francesca Vasile, Maurizio Sironi, J. Fernando Diaz, Andrea E. Prota, Stefano Pieraccini, Daniele Passarella

Summary: Maytansinoids, a potent class of tubulin binders with cytotoxic activity, have been limited in their application as cytotoxins and chemical probes due to the complexity of natural product chemistry. In this study, the synthesis of long-chain derivatives and maytansinoid conjugates was reported, confirming that bulky substituents do not affect their activity or binding mode. These results provide new opportunities for the design of maytansine-based probes.

CHEMISTRY-A EUROPEAN JOURNAL (2023)

Article Chemistry, Medicinal

French dispatch: GTM-based analysis of the Chimiotheque Nationale Chemical Space

Polina Oleneva, Yuliana Zabolotna, Dragos Horvath, Gilles Marcou, Fanny Bonachera, Alexandre Varnek

Summary: The Chimiotheque Nationale (CN) was compared with ZINC and ChEMBL to analyze its screening and biologically relevant compounds, including chemical space coverage, physicochemical properties, and Bemis-Murcko scaffold populations. Over 5 K CN-unique scaffolds were identified. Generative Topographic Maps (GTMs) were generated to compare compound populations. Hierarchical GTM (<< zooming >>) was used to create an ensemble of maps at different resolutions, from global overview to individual structure mapping. These maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility showed that only 29.7% of CN compounds can be fully synthesized using commercially available building blocks.

MOLECULAR INFORMATICS (2023)

Review Biochemistry & Molecular Biology

Computational Approaches to the Rational Design of Tubulin-Targeting Agents

Helena Perez-Pena, Anne-Catherine Abel, Maxim Shevelev, Andrea E. E. Prota, Stefano Pieraccini, Dragos Horvath

Summary: Microtubules are essential in cellular processes and have potential as targets for cancer and neurodegeneration research. However, current tubulin binders have limitations, making the discovery of safer and more efficient agents necessary. Computer-aided design techniques and accessible tubulin-ligand structures can aid in the selection and design of new tubulin-targeting agents.

BIOMOLECULES (2023)

Article Chemistry, Medicinal

GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design

Giuseppe Lamanna, Pietro Delre, Gilles Marcou, Michele Saviano, Alexandre Varnek, Dragos Horvath, Giuseppe Felice Mangiatordi

Summary: This study introduces a new de novo design algorithm called GENERA that combines the capabilities of a deep-learning algorithm for automated drug-like analogue design, called DeLA-Drug, with a genetic algorithm for generating molecules with desired target-oriented properties. GENERA was applied to the angiotensin-converting enzyme 2 (ACE2) target, and its ability to de novo design promising candidates was assessed using docking programs PLANTS and GLIDE. The study demonstrates that GENERA can effectively perform multiobjective optimization and generate focused libraries with better scores compared to a starting set of known ACE-2 binders.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Article Chemistry, Medicinal

Chemical Library Space: Definition and DNA-Encoded Library Comparison Study Case

Regina Pikalyova, Yuliana Zabolotna, Dragos Horvath, Gilles Marcou, Alexandre Varnek

Summary: The development of DNA-encoded library (DEL) technology has brought new challenges to the analysis of chemical libraries. This study introduces the concept of chemical library space (CLS) and compares four representations obtained using generative topographic mapping. These encodings allow for effective comparison of libraries and fine-tuning of matching criteria. The proposed CLS can be used for efficient analysis and selection of chemical libraries.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Article Chemistry, Medicinal

Meta-GTM: Visualization and Analysis of the Chemical Library Space

Regina Pikalyova, Yuliana Zabolotna, Dragos Horvath, Gilles Marcou, Alexandre Varnek

Summary: In chemical library analysis, it can be beneficial to describe libraries as individual items rather than collections of compounds. This is especially true for large non-selectable compound mixtures like DNA-encoded libraries (DELs). The chemical library space (CLS) is useful for managing a portfolio of libraries, similar to how chemical space (CS) helps manage portfolios of molecules. Mapping the CLS on meta-GTMs allows for analysis beyond pairwise library comparison, facilitating the selection of the most suitable libraries for specific projects.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Article Biochemistry & Molecular Biology

Comprehensive analysis of commercial fragment libraries

Julia Revillo Imbernon, Celien Jacquemard, Guillaume Bret, Gilles Marcou, Esther Kellenberger

Summary: The screening of fragment libraries is crucial in drug discovery, with the success depending on the quality and design of the library meeting specific research requirements. This study conducted an inventory of commercial fragment libraries and developed a methodology to classify any library based on its similarity, coverage, and structural features, leading to the creation of a model that considers fragment diversity and ease of interpretation.

RSC MEDICINAL CHEMISTRY (2022)

No Data Available