4.6 Article

Siamese Recurrent Neural Network with a Self-Attention Mechanism for Bioactivity Prediction

Journal

ACS OMEGA
Volume 6, Issue 16, Pages 11086-11094

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acsomega.1c01266

Keywords

-

Ask authors/readers for more resources

Activity prediction is crucial in drug discovery, and the SiameseCHEM model demonstrates superior performance in predicting the bioactivity of small molecules compared to traditional machine learning models, especially when handling SMILES strings.
Activity prediction plays an essential role in drug discovery by directing search of drug candidates in the relevant chemical space. Despite being applied successfully to image recognition and semantic similarity, the Siamese neural network has rarely been explored in drug discovery where modelling faces challenges such as insufficient data and class imbalance. Here, we present a Siamese recurrent neural network model (SiameseCHEM) based on bidirectional long short-term memory architecture with a self-attention mechanism, which can automatically learn discriminative features from the SMILES representations of small molecules. Subsequently, it is used to categorize bioactivity of small molecules via N-shot learning. Trained on random SMILES strings, it proves robust across five different datasets for the task of binary or categorical classification of bioactivity. Benchmarking against two baseline machine learning models which use the chemistry-rich ECFP fingerprints as the input, the deep learning model outperforms on three datasets and achieves comparable performance on the other two. The failure of both baseline methods on SMILES strings highlights that the deep learning model may learn task-specific chemistry features encoded in SMILES strings.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available