4.7 Article Data Paper

GEOM, energy-annotated molecular conformations for property prediction and molecular generation

期刊

SCIENTIFIC DATA
卷 9, 期 1, 页码 -

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41597-022-01288-4

关键词

-

资金

  1. XSEDE COVID-19 HPC Consortium [CHE200039]
  2. LBNL National Energy Research Scientific Computing Center (NERSC)
  3. DARPA [HR00111920025]
  4. MIT-IBM Watson AI Lab
  5. NASA Advanced Supercomputing (NAS) Division
  6. MIT Engaging cluster
  7. Harvard Cannon cluster
  8. MIT Lincoln Lab Supercloud clusters

向作者/读者索取更多资源

Machine learning outperforms traditional approaches in molecular design. However, most ML models only predict molecular properties based on 2D chemical graphs or single 3D structures, neglecting the ensemble of 3D conformers accessible to a molecule. This article introduces a large-scale dataset, GEOM, which contains accurate conformers and experimental data annotations, aiming to facilitate the development of models predicting properties from conformer ensembles and generative models sampling 3D conformations.
Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and experimental data. Here we use advanced sampling and semi-empirical density functional theory (DFT) to generate 37 million molecular conformations for over 450,000 molecules. The Geometric Ensemble Of Molecules (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with experimental data related to biophysics, physiology, and physical chemistry. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Chemistry, Multidisciplinary

Bottlebrush polymers with flexible enantiomeric side chains display differential biological properties

Hung V. -T. Nguyen, Yivan Jiang, Somesh Mohapatra, Wencong Wang, Jonathan C. Barnes, Nathan J. Oldenhuis, Kathleen K. Chen, Simon Axelrod, Zhihao Huang, Qixian Chen, Matthew R. Golder, Katherine Young, Dylan Suvlu, Yizhi Shen, Adam P. Willard, Michael J. A. Hore, Rafael Gomez-Bombarelli, Jeremiah A. Johnson

Summary: This study synthesized water-soluble chiral bottlebrush polymers using macromonomers of different rigidity, and discovered that polymers with conformationally flexible mirror image side chains exhibited significant differences in properties compared to those with comparably rigid mirror image side chains. The observations were rationalized by correlating greater conformational freedom with enhanced chiral recognition, providing insights for the design of future biomaterials.

NATURE CHEMISTRY (2022)

Article Chemistry, Multidisciplinary

Multi-fidelity prediction of molecular optical peaks with deep learning

Kevin P. Greenman, William H. Green, Rafael Gomez-Bombarelli

Summary: Optical properties play a central role in molecular design for various applications, with existing theoretical and statistical methods balancing accuracy, generality, and cost. This study utilizes neural networks to predict molecular absorption peaks in solution, achieving higher accuracy and generalizability through a multi-fidelity approach based on an auxiliary model.

CHEMICAL SCIENCE (2022)

Article Chemistry, Multidisciplinary

Tunable CHA/AEI Zeolite Intergrowths with A Priori Biselective Organic Structure-Directing Agents: Controlling Enrichment and Implications for Selective Catalytic Reduction of NOx

Estefania Bello-Jurado, Daniel Schwalbe-Koda, Mathias Nero, Cecilia Paris, Toni Uusimaki, Yuriy Roman-Leshkov, Avelino Corma, Tom Willhammar, Rafael Gomez-Bombarelli, Manuel Moliner

Summary: A novel methodology based on high-throughput simulations has been developed to design unique biselective organic structure-directing agents (OSDAs) that enable the efficient synthesis of CHA/AEI zeolite intergrowth materials with controlled phase compositions. These materials exhibit outstanding catalytic performance and hydrothermal stability, surpassing even the performance of commercial CHA-type catalysts. This methodology opens up possibilities for synthesizing new zeolite intergrowth materials with more complex structures and unique catalytic properties.

ANGEWANDTE CHEMIE-INTERNATIONAL EDITION (2022)

Article Chemistry, Physical

Sampling lattices in semi-grand canonical ensemble with autoregressive machine learning

James Damewood, Daniel Schwalbe-Koda, Rafael Gomez-Bombarelli

Summary: Efficient and accurate calculation of thermodynamic potentials and observables is crucial for the application of statistical mechanics simulations in materials science. Existing naive Monte Carlo methods cannot handle the calculation demands of complex materials, so we transform machine learning-based generative models into the semi-grand canonical ensemble to address this issue. The resulting models are transferable across different thermodynamic conditions and can be used with various internal energy models.

NPJ COMPUTATIONAL MATERIALS (2022)

Article Chemistry, Physical

Learning pair potentials using differentiable simulations

Wujie Wang, Zhenghao Wu, Johannes C. B. Dietschreit, Rafael Gomez-Bombarelli

Summary: In this study, a general stochastic method called DiffSim is proposed to learn pair interactions from data using differentiable simulations. The method uses molecular dynamics simulations and stochastic gradient descent to directly learn interaction potentials based on structural observables. DiffSim is flexible and can simultaneously simulate and optimize multiple systems, such as different temperatures or compositions. The results show that DiffSim can explore a wider functional space of pair potentials compared to traditional methods like iterative Boltzmann inversion. The methods can also be used to simultaneously fit potentials for simulations at different compositions and temperatures to improve transferability.

JOURNAL OF CHEMICAL PHYSICS (2023)

Article Chemistry, Multidisciplinary

Thermal Half-Lives of Azobenzene Derivatives: Virtual Screening Based on Intersystem Crossing Using a Machine Learning Potential

Simon Axelrod, Eugene Shakhnovich, Rafael Gomez-Bombarelli

Summary: This article introduces a computational tool for predicting the thermal half-lives of azobenzene derivatives, which are key photoswitches in light-activated drugs. Through machine learning and quantum chemistry data, the authors automated the prediction of thermal half-lives for 19,000 azobenzene derivatives and explored trends and trade-offs between barriers and absorption wavelengths.

NANO LETTERS (2023)

Article Chemistry, Multidisciplinary

Chemistry-Informed Machine Learning for Polymer Electrolyte Discovery

Gabriel Bradford, Jeffrey Lopez, Jurgis Ruza, Michael A. Stolberg, Richard Osterude, Jeremiah A. Johnson, Rafael Gomez-Bombarelli, Yang Shao-Horn

Summary: Solid polymer electrolytes (SPEs) have the potential to improve lithium-ion batteries by enhancing safety and enabling higher energy densities. A chemistry-informed machine learning model was developed to predict ionic conductivity of SPEs, using data from hundreds of experimental publications. The model encodes the Arrhenius equation into the readout layer of a neural network and has improved accuracy in predicting ionic conductivity.

ACS CENTRAL SCIENCE (2023)

Article Materials Science, Multidisciplinary

Graph theory-based structural analysis on density anomaly of silica glass

Aik Rui Tan, Shingo Urata, Masatsugu Yamada, Rafael Gomez-Bombarelli

Summary: Analyzing the atomic structure of glassy materials is challenging, but using a graph-theoretical approach can help understand the topological differences between disordered structural arrangements. By comparing different thermodynamic states of silica glass, it was found that silica glasses exhibit distinct topological features at temperatures higher than the fictive temperature. Graph-based analysis suggests that the anomalous density behavior of silica glass may be attributed to the increased formation of oxygen triclusters and reduced number of larger sized cycles at the density minimum temperature.

COMPUTATIONAL MATERIALS SCIENCE (2023)

Article Multidisciplinary Sciences

Approaching enzymatic catalysis with zeolites or how to select one reaction mechanism competing with others

Pau Ferri, Chengeng Li, Daniel Schwalbe-Koda, Mingrou Xie, Manuel Moliner, Rafael Gomez-Bombarelli, Mercedes Boronat, Avelino Corma

Summary: Approaching the level of molecular recognition of enzymes with solid catalysts is a challenging goal, achieved in this work for the competing transalkylation and disproportionation of diethylbenzene catalyzed by acid zeolites. The key diaryl intermediates for the two competing reactions only differ in the number of ethyl substituents in the aromatic rings, and therefore finding a selective zeolite able to recognize this subtle difference requires an accurate balance of the stabilization of reaction intermediates and transition states inside the zeolite microporous voids.

NATURE COMMUNICATIONS (2023)

Article Energy & Fuels

Simulations with machine learning potentials identify the ion conduction mechanism mediating non-Arrhenius behavior in LGPS

Gavin Winter, Rafael Gomez-Bombarelli

Summary: Li10Ge(PS6)(2) (LGPS) is a highly concentrated solid electrolyte with Coulombic repulsion between neighboring cations hypothesized as the reason for ion hopping mechanism. By using a neural network potential trained on density functional theory (DFT) simulations, MD simulations were conducted to study ion conduction mechanisms at a range of temperatures including previous simulations and experimental studies. The results showed a Li+ sublattice phase transition in LGPS near 400 K which drastically reduced the ab-plane diffusivity. The sublattice phase transition was accompanied by less cation-cation correlation and more harmonic vibrations at lower temperature, indicating slower ion conduction.

JOURNAL OF PHYSICS-ENERGY (2023)

Article Chemistry, Multidisciplinary

Thermal Half-Lives of Azobenzene Derivatives: Virtual Screening Based on Intersystem Crossing Using a Machine Learning Potential

Simon Axelrod, Eugene Shakhnovich, Rafael Gomez-Bombarelli

Summary: Molecular photoswitches, such as azobenzene, play a crucial role in light-activated drugs. This study presents a computational tool for predicting the thermal half-lives of azobenzene derivatives, using a fast and accurate machine learning potential trained on quantum chemistry data. The research explores trends and trade-offs between barriers and absorption wavelengths, and provides open access to the data and software for further research in photopharmacology.

ACS CENTRAL SCIENCE (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Generative Coarse-Graining of Molecular Conformations

Wujie Wang, Minkai Xu, Chen Cai, Benjamin Kurt Miller, Tess Smidt, Yusu Wang, Jian Tang, Rafael Gomez-Bombarelli

Summary: In this paper, a novel model is proposed for reconstructing fine-grained coordinates from coarse-grained coordinates. The model encodes the uncertainties of the fine-grained representation into a latent space and decodes them back to fine-grained geometries using equivariant convolutions. Experimental results demonstrate that this approach can recover more realistic structures and outperforms existing data-driven methods.

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 (2022)

Article Chemistry, Multidisciplinary

Learning Matter: Materials Design with Machine Learning and Atomistic Simulations

Simon Axelrod, Daniel Schwalbe-Koda, Somesh Mohapatra, James Damewood, Kevin P. Greenman, Rafael Gomez-Bombarelli

Summary: Designing new materials is crucial for addressing societal challenges, and computational techniques such as atomistic simulation and machine learning (ML) offer an avenue for rapid material invention. This article reviews the recent contributions of simulation and ML in materials design, discussing numerical representation of materials, ML methods for enhancing atomistic simulation, and high-throughput virtual screening. The limitations of ML and simulation are also discussed.

ACCOUNTS OF MATERIALS RESEARCH (2022)

Article Computer Science, Artificial Intelligence

Chemistry-informed macromolecule graph representation for similarity computation, unsupervised and supervised learning

Somesh Mohapatra, Joyce An, Rafael Gomez-Bombarelli

Summary: This article presents the development of a chemistry-informed graph representation of macromolecules, allowing for quantification of structural similarity and interpretable supervised learning. It enables quantitative chemistry-informed decision-making and iterative design in the macromolecular chemical space.

MACHINE LEARNING-SCIENCE AND TECHNOLOGY (2022)

暂无数据