4.5 Article

Protein Function Prediction With Functional and Topological Knowledge of Gene Ontology

Journal

IEEE TRANSACTIONS ON NANOBIOSCIENCE
Volume 22, Issue 4, Pages 755-762

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNB.2023.3278033

Keywords

Protein function prediction; gene ontology; multi-view GCN; pre-trained language model

Ask authors/readers for more resources

In this study, we propose a method that combines functional and topological knowledge to guide protein function prediction. Our method uses a multi-view GCN model to extract various GO representations and employs an attention mechanism to dynamically learn the significance weights of these representations. Experimental results demonstrate that our method outperforms other approaches on datasets from different species.
Gene Ontology (GO) is a widely used bioinformatics resource for describing biological processes, molecular functions, and cellular components of proteins. It covers more than 5000 terms hierarchically organized into a directed acyclic graph and known functional annotations. Automatically annotating protein functions by using GO-based computational models has been an area of active research for a long time. However, due to the limited functional annotation information and complex topological structures of GO, existing models cannot effectively capture the knowledge representation of GO. To solve this issue, we present a method that fuses the functional and topological knowledge of GO to guide protein function prediction. This method employs a multi-view GCN model to extract a variety of GO representations from functional information, topological structure, and their combinations. To dynamically learn the significance weights of these representations, it adopts an attention mechanism to learn the final knowledge representation of GO. Furthermore, it uses a pre-trained language model (i.e., ESM-1b) to efficiently learn biological features for each protein sequence. Finally, it obtains all predicted scores by calculating the dot product of sequence features and GO representation. Our method outperforms other state-of-the-art methods, as demonstrated by the experimental results on datasets from three different species, namely Yeast, Human and Arabidopsis. Our proposed method's code can be accessed at: https://github.com/Candyperfect/Master.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available