4.7 Article

Bidirectional Molecule Generation with Recurrent Neural Networks

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 60, 期 3, 页码 1175-1183

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.9b00943

关键词

-

资金

  1. Swiss National Science Foundation (SNSF) [205321_182176]
  2. Novartis Forschungsstiftung
  3. ETH RETHINK Initiative

向作者/读者索取更多资源

Recurrent neural networks (RNNs) are able to generate de novo molecular designs using simplified molecular input line entry systems (SMILES) string representations of the chemical structure. RNN-based structure generation is usually performed unidirectionally, by growing SMILES strings from left to right. However, there is no natural start or end of a small molecule, and SMILES strings are intrinsically nonunivocal representations of molecular graphs. These properties motivate bidirectional structure generation. Here, bidirectional generative RNNs for SMILES-based molecule design are introduced. To this end, two established bidirectional methods were implemented, and a new method for SMILES string generation and data augmentation is introduced-the bidirectional molecule design by alternate learning (BIMODAL). These three bidirectional strategies were compared to the unidirectional forward RNN approach for SMILES string generation, in terms of the (i) novelty, (ii) scaffold diversity, and (iii) chemical-biological relevance of the computer-generated molecules. The results positively advocate bidirectional strategies for SMILES-based molecular de novo design, with BIMODAL showing superior results to the unidirectional forward RNN for most of the criteria in the tested conditions. The code of the methods and the pretrained models can be found at URL https://github.com/ETHmodlab/BIMODAL.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Automation & Control Systems

Predicting molecular activity on nuclear receptors by multitask neural networks

Cecile Valsecchi, Magda Collarile, Francesca Grisoni, Roberto Todeschini, Davide Ballabio, Viviana Consonni

Summary: The interest in multitask and deep learning strategies for quantitative structure-activity relationship (QSAR) analysis has been increasing. In this study, the binary classification capability of multitask deep and shallow neural networks were compared to single-task strategies and other benchmark methods. The results showed that multitask learning is beneficial for tasks that are less represented, and multitask deep learning strategies performed similarly to some single-task approaches.

JOURNAL OF CHEMOMETRICS (2022)

Review Pharmacology & Pharmacy

Artificial intelligence in drug discovery: recent advances and future perspectives

Jose Jimenez-Luna, Francesca Grisoni, Nils Weskamp, Gisbert Schneider

Summary: This article reviews the current status of AI in chemoinformatics, discussing topics such as quantitative structure-activity/property relationship and structure-based modeling, de novo molecular design, and chemical synthesis prediction. The advantages and limitations of current deep learning applications are highlighted, offering a perspective on next-generation AI for drug discovery.

EXPERT OPINION ON DRUG DISCOVERY (2021)

Article Chemistry, Multidisciplinary

Beam Search for Automated Design and Scoring of Novel ROR Ligands with Machine Intelligence

Michael Moret, Moritz Helmstaedter, Francesca Grisoni, Gisbert Schneider, Daniel Merk

Summary: Chemical language models coupled with the beam search algorithm were used to automate molecule design and scoring, resulting in the discovery of novel inverse agonists for retinoic acid receptor-related orphan receptors (RORs). These designs were synthesizable in three reaction steps and exhibited low-micromolar to nanomolar potency towards RORg, showcasing the potential of generative artificial intelligence in data-driven drug discovery.

ANGEWANDTE CHEMIE-INTERNATIONAL EDITION (2021)

Article Multidisciplinary Sciences

Combining generative artificial intelligence and on-chip synthesis for de novo drug design

Francesca Grisoni, Berend J. H. Huisman, Alexander L. Button, Michael Moret, Kenneth Atz, Daniel Merk, Gisbert Schneider

Summary: Automating the molecular design-make-test-analyze cycle has led to successful generation of potent LXR agonists, confirming the applicability of the proposed framework for automated drug design.

SCIENCE ADVANCES (2021)

Article Chemistry, Medicinal

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

Michael Moret, Francesca Grisoni, Paul Katzberger, Gisbert Schneider

Summary: Chemical language models (CLMs) are useful for designing molecules with desired properties. This study introduces the perplexity metric to evaluate the generated molecules' similarity to the design objectives, ranking the promising designs. The perplexity scoring also helps identify and remove undesired biases in the model training process.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Chemistry, Medicinal

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

Derek van Tilborg, Alisa Alenicheva, Francesca Grisoni

Summary: Machine learning plays a crucial role in drug discovery and chemistry. However, the effect of activity cliffs - molecules that are structurally similar but exhibit significant differences in potency - on model performance has received limited attention. In this study, we benchmarked 24 machine and deep learning approaches and found that machine learning methods based on molecular descriptors outperformed more complex deep learning methods in predicting the properties of activity cliffs. Our findings highlight the need for dedicated metrics and novel algorithms to address the limitation posed by activity cliffs in molecular machine learning models.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Multidisciplinary Sciences

Leveraging molecular structure and bioactivity with chemical language models for de novo drug design

Michael Moret, Irene Pachon Angona, Leandro Cotos, Shen Yan, Kenneth Atz, Cyrill Brunner, Martin Baumgartner, Francesca Grisoni, Gisbert Schneider

Summary: Generative chemical language models (CLMs) can be used to generate new molecular structures from a textual representation. Hybrid CLMs can leverage bioactivity information for training compounds. In this study, a virtual compound library was created using a generative CLM and refined using a CLM-based classifier for bioactivity prediction. A new PI3K gamma ligand with sub-micromolar activity was identified, highlighting the potential of hybrid CLMs for molecular design.

NATURE COMMUNICATIONS (2023)

Review Biochemistry & Molecular Biology

Structure-Based Drug Discovery with Deep Learning

R. Ozcelik, D. van Tilborg, J. Jimenez-Luna, F. Grisoni

Summary: Artificial intelligence (AI) in the form of deep learning is promising for drug discovery and chemical biology, especially in protein structure prediction, organic synthesis planning, and molecule design. While most efforts have focused on ligand-based approaches, structure-based drug discovery has the potential to address unsolved challenges such as affinity prediction for new protein targets and understanding chemical kinetic properties. Advances in deep learning methodologies and accurate protein structure predictions support a resurgence in structure-based approaches guided by AI. This review summarizes key algorithmic concepts in structure-based deep learning for drug discovery and discusses future opportunities, applications, and challenges.

CHEMBIOCHEM (2023)

Article Biochemistry & Molecular Biology

Chemical language models for de novo drug design: Challenges and opportunities

Francesca Grisoni

Summary: Generative deep learning is revolutionizing de novo drug design by enabling the generation of molecules with specific properties. Chemical language models, which use deep learning to generate new molecules as strings, have been remarkably successful in this endeavor. With advances in natural language processing and interdisciplinary collaborations, chemical language models are expected to play a key role in the future of drug discovery.

CURRENT OPINION IN STRUCTURAL BIOLOGY (2023)

Correction Chemistry, Medicinal

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs (vol 62, pg 5938, 2022)

Derek van Tilborg, Alisa Alenicheva, Francesca Grisoni

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Article Chemistry, Medicinal

De Novo Design of Nurr1 Agonists via Fragment-Augmented Generative Deep Learning in Low-Data Regime

Marco Ballarotto, Sabine Willems, Tanja Stiller, Felix Nawa, Julian A. A. Marschner, Francesca Grisoni, Daniel Merk

Summary: Generative neural networks trained on SMILES can design innovative bioactive molecules de novo. These models have usually been fine-tuned on template molecules but it is challenging to apply them to orphan targets with few known ligands.

JOURNAL OF MEDICINAL CHEMISTRY (2023)

Review Biotechnology & Applied Microbiology

Artificial intelligence for natural product drug discovery

Michael W. Mullowney, Katherine R. Duncan, Somayah S. Elsayed, Neha Garg, Justin J. J. van der Hooft, Nathaniel I. Martin, David Meijer, Barbara R. Terlouw, Friederike Biermann, Kai Blin, Janani Durairaj, Marina Gorostiola Gonzalez, Eric J. N. Helfrich, Florian Huber, Stefan Leopold-Messer, Kohulan Rajan, Tristan de Rond, Jeffrey A. van Santen, Maria Sorokina, Marcy J. Balunas, Mehdi A. Beniddir, Doris A. van Bergeijk, Laura M. Carroll, Chase M. Clark, Djork-Arne Clevert, Chris A. Dejong, Chao Du, Scarlet Ferrinho, Francesca Grisoni, Albert Hofstetter, Willem Jespers, Olga V. Kalinina, Satria A. Kautsar, Hyunwoo Kim, Tiago F. Leao, Joleen Masschelein, Evan R. Rees, Raphael Reher, Daniel Reker, Philippe Schwaller, Marwin Segler, Michael A. Skinnider, Allison S. Walker, Egon L. Willighagen, Barbara Zdrazil, Nadine Ziemert, Rebecca J. M. Goss, Pierre Guyomard, Andrea Volkamer, William H. Gerwick, Hyun Uk Kim, Rolf Mueller, Gilles P. van Wezel, Gerard J. P. van Westen, Anna K. H. Hirsch, Roger G. Linington, Serina L. Robinson, Marnix H. Medema

Summary: The developments in computational omics technologies in combination with artificial intelligence approaches have opened up new possibilities for drug discovery. However, addressing key challenges such as high-quality datasets and algorithm validation is essential to realize the potential of these synergies.

NATURE REVIEWS DRUG DISCOVERY (2023)

Article Chemistry, Multidisciplinary

Identification of fluorescently-barcoded nanoparticles using machine learning

Ana Ortiz-Perez, Cristina Izquierdo-Lozano, Rens Meijers, Francesca Grisoni, Lorenzo Albertazzi

Summary: Barcoding is a powerful tool to distinguish multiple targets within a complex mixture and increase assay throughput. While fluorescent barcoding of microparticles is widely used, it is more challenging for nanoparticles due to their small size and heterogeneity. In this study, a machine-learning-assisted workflow was developed to write, read, and classify barcoded PLGA-PEG nanoparticles at a single-particle level.

NANOSCALE ADVANCES (2023)

Article Environmental Sciences

CATMoS: Collaborative Acute Toxicity Modeling Suite

Kamel Mansouri, Agnes L. Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E. H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce Bastos, Stephen Boyd, J. B. Brown, Stephen J. Capuzzi, Yaroslav Chushak, Heather Ciallella, Alex M. Clark, Viviana Consonni, Pankaj R. Daga, Sean Ekins, Sherif Farag, Maxim Fedorov, Denis Fourches, Domenico Gadaleta, Feng Gao, Jeffery M. Gearhart, Garett Goh, Jonathan M. Goodman, Francesca Grisoni, Christopher M. Grulke, Thomas Hartung, Matthew Hirn, Pavel Karpov, Alexandru Korotcov, Giovanna J. Lavado, Michael Lawless, Xinhao Li, Thomas Luechtefeld, Filippo Lunghini, Giuseppe F. Mangiatordi, Gilles Marcou, Dan Marsh, Todd Martin, Andrea Mauri, Eugene N. Muratov, Glenn J. Myatt, Dac-Trung Nguyen, Orazio Nicolotti, Reine Note, Paritosh Pande, Amanda K. Parks, Tyler Peryea, Ahsan H. Polash, Robert Rallo, Alessandra Roncaglioni, Craig Rowlands, Patricia Ruiz, Daniel P. Russo, Ahmed Sayed, Risa Sayre, Timothy Sheils, Charles Siegel, Arthur C. Silva, Anton Simeonov, Sergey Sosnin, Noel Southall, Judy Strickland, Yun Tang, Brian Teppen, Igor Tetko, Dennis Thomas, Valery Tkachenko, Roberto Todeschini, Cosimo Toma, Ignacio Tripodi, Daniela Trisciuzzi, Alexander Tropsha, Alexandre Varnek, Kristijan Vukovic, Zhongyu Wang, Liguo Wang, Katrina M. Waters, Andrew J. Wedlake, Sanjeeva J. Wijeyesakere, Dan Wilson, Zijun Xiao, Hongbin Yang, Gergely Zahoranszky-Kohalmi, Alexey Zakharov, Fagen F. Zhang, Zhen Zhang, Tongan Zhao, Hao Zhu, Kimberley M. Zorn, Warren Casey, Nicole C. Kleinstreuer

Summary: The international collaboration in developing in silico models for predicting acute oral toxicity, resulting in the CATMoS, has demonstrated high performance in terms of accuracy and robustness. This modeling suite is being evaluated by regulatory agencies as a potential replacement for in vivo rat acute oral toxicity studies.

ENVIRONMENTAL HEALTH PERSPECTIVES (2021)

Article Chemistry, Multidisciplinary

Beam Search for Automated Design and Scoring of Novel ROR Ligands with Machine Intelligence

Michael Moret, Moritz Helmstaedter, Francesca Grisoni, Gisbert Schneider, Daniel Merk

Summary: Chemical language models combined with the beam search algorithm as an automated molecule design and scoring technique can generate novel compounds with potential bioactivity. The newly discovered inverse agonists can be synthesized in a few simple reaction steps and exhibit low micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, further expanding the applicability of generative artificial intelligence to data-driven drug discovery.

ANGEWANDTE CHEMIE-INTERNATIONAL EDITION (2021)

暂无数据