4.7 Article

Accurate Prediction of Aqueous Free Solvation Energies Using 3D Atomic Feature-Based Graph Neural Network with Transfer Learning

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 62, 期 8, 页码 1840-1848

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.2c00260

关键词

-

资金

  1. U.S. National Institutes of Health [R35-GM127040]

向作者/读者索取更多资源

In this study, a large and diverse calculated data set Frag20-Aqsol-100K of aqueous solvation free energy is built using electronic structure calculations with continuum solvent models. A novel 3D atomic feature-based GNN model is developed, and a transfer learning strategy is employed to achieve state-of-the-art prediction on the FreeSolv data set. The results indicate that integrating molecular modeling and DL provides a promising strategy for developing robust prediction models in molecular science.
Graph neural network (GNN)-based deep learning (DL) models have been widely implemented to predict the experimental aqueous solvation free energy, while its prediction accuracy has reached a plateau partly due to the scarcity of available experimental data. In order to tackle this challenge, we first build a large and diverse calculated data set Frag20-Aqsol-100K of aqueous solvation free energy with reasonable computational cost and accuracy via electronic structure calculations with continuum solvent models. Then, we develop a novel 3D atomic feature-based GNN model with the principal neighborhood aggregation (PNAConv) and demonstrate that 3D atomic features obtained from molecular mechanics-optimized geometries can significantly improve the learning power of GNN models in predicting calculated solvation free energies. Finally, we employ a transfer learning strategy by pre-training our DL model on Frag20-Aqsol-100K and fine-tuning it on the small experimental data set, and the fine-tuned model A3D-PNAConv-FT achieves the state-of-the-art prediction on the FreeSolv data set with a root-mean-squared error of 0.719 kcal/mol and a mean-absolute error of 0.417 kcal/mol using random data splits. These results indicate that integrating molecular modeling and DL would be a promising strategy to develop robust prediction models in molecular science. The source code and data are accessible at: https://yzhang.hpc.nyu.edu/IMA.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据