4.6 Article

Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction

期刊

ACS OMEGA
卷 7, 期 18, 页码 15695-15710

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acsomega.2c00642

关键词

-

资金

  1. Energy Storage Materials Initiative (ESMI), Laboratory Directed Research and Development Project at Pacific Northwest National Laboratory (PNNL)
  2. U.S. Department of Energy (DOE) by Battelle Memorial Institute [DE-AC05-76RL01830]

向作者/读者索取更多资源

This study aims to evaluate current deep learning methods for solubility prediction and develop a general model for predicting the solubility of various organic molecules. The research found that models using molecular descriptors perform the best, with GNN models also achieving good performance.
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and threedimensional atomic coordinates using four different neural network architectures-fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据