4.6 Article

Exploring the GDB-13 chemical space using deep generative models

期刊

JOURNAL OF CHEMINFORMATICS
卷 11, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s13321-019-0341-z

关键词

Deep learning; Chemical space exploration; Deep generative models; Recurrent neural networks; Chemical databases

资金

  1. European Union [676434]
  2. Marie Curie Actions (MSCA) [676434] Funding Source: Marie Curie Actions (MSCA)

向作者/读者索取更多资源

Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) with a subset of the enumerated database GDB-13 (975 million molecules). We show that a model trained with 1 million structures (0.1% of the database) reproduces 68.9% of the entire database after training, when sampling 2 billion molecules. We also developed a method to assess the quality of the training process using negative log-likelihood plots. Furthermore, we use a mathematical model based on the coupon collector problem that compares the trained model to an upper bound and thus we are able to quantify how much it has learned. We also suggest that this method can be used as a tool to benchmark the learning capabilities of any molecular generative model architecture. Additionally, an analysis of the generated chemical space was performed, which shows that, mostly due to the syntax of SMILES, complex molecules with many rings and heteroatoms are more difficult to sample.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据