4.6 Article

NSECDA: Natural Semantic Enhancement for CircRNA-Disease Association Prediction

Journal

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS
Volume 26, Issue 10, Pages 5075-5084

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JBHI.2022.3199462

Keywords

Diseases; Semantics; Predictive models; Biological system modeling; Data models; Feature extraction; RNA; Circular RNA; CircRNA-disease association; Natural semantic; Graph attention network; Rotation forest

Funding

  1. Science and Technology Innovation 2030-Brain Science and Brain-like Research Major Project [2021ZD0200403]
  2. National Natural Science Foundation of China [62172355, 61702444]
  3. Qingtan Scholar Talent Project of Zaozhuang University

Ask authors/readers for more resources

Increasing evidence suggests a close relationship between circRNA and diseases, making it a promising biomarker. This paper proposes a natural semantic enhancement method, NSECDA, to predict circRNA-disease associations. The method combines natural language understanding theory, disease attributes, and circRNA-disease Gaussian Interaction Profile kernel attributes and utilizes a Graph Attention Network to focus on influential attributes. The proposed model, NSECDA, achieved high accuracy and AUC scores in predicting circRNA-disease associations.
Increasing evidence suggest that circRNA, as one of the most promising emerging biomarkers, has a very close relationship with diseases. Exploring the relationship between circRNA and diseases can provide novel perspective for diseases diagnosis and pathogenesis. The existing circRNA-disease association (CDA) prediction models, however, generally treat the data attributes equally, do not pay special attention to the attributes with more significant influence, and do not make full use of the correlation and symbiosis between attributes to dig into the latent semantic information of the data. Therefore, in response to the above problems, this paper proposes a natural semantic enhancement method NSECDA to predict CDA. In practical terms, we first recognize the circRNA sequence as a biological language, and analyze its natural semantic properties through the natural language understanding theory; then integrate it with disease attributes, circRNA and disease Gaussian Interaction Profile (GIP) kernel attributes, and use Graph Attention Network (GAT) to focus on the influential attributes, so as to mine the deeply hidden features; finally, the Rotation Forest (RoF) classifier was used to accurately determine CDA. In the gold standard data set CircR2Disease, NSECDA achieved 92.49% accuracy with 0.9225 AUC score. In comparison with the non-natural semantic enhancement model and other classifier models, NSECDA also shows competitive performance. Additionally, 25 of the CDA pairs with unknown associations in the top 30 prediction scores of NSECDA have been proven by newly reported studies. These achievements suggest that NSECDA is an effective model to predict CDA, which can provide credible candidate for subsequent wet experiments, thus significantly reducing the scope of investigations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available