Journal
IEEE TRANSACTIONS ON NANOBIOSCIENCE
Volume 22, Issue 4, Pages 755-762Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNB.2023.3278033
Keywords
Protein function prediction; gene ontology; multi-view GCN; pre-trained language model
Ask authors/readers for more resources
In this study, we propose a method that combines functional and topological knowledge to guide protein function prediction. Our method uses a multi-view GCN model to extract various GO representations and employs an attention mechanism to dynamically learn the significance weights of these representations. Experimental results demonstrate that our method outperforms other approaches on datasets from different species.
Gene Ontology (GO) is a widely used bioinformatics resource for describing biological processes, molecular functions, and cellular components of proteins. It covers more than 5000 terms hierarchically organized into a directed acyclic graph and known functional annotations. Automatically annotating protein functions by using GO-based computational models has been an area of active research for a long time. However, due to the limited functional annotation information and complex topological structures of GO, existing models cannot effectively capture the knowledge representation of GO. To solve this issue, we present a method that fuses the functional and topological knowledge of GO to guide protein function prediction. This method employs a multi-view GCN model to extract a variety of GO representations from functional information, topological structure, and their combinations. To dynamically learn the significance weights of these representations, it adopts an attention mechanism to learn the final knowledge representation of GO. Furthermore, it uses a pre-trained language model (i.e., ESM-1b) to efficiently learn biological features for each protein sequence. Finally, it obtains all predicted scores by calculating the dot product of sequence features and GO representation. Our method outperforms other state-of-the-art methods, as demonstrated by the experimental results on datasets from three different species, namely Yeast, Human and Arabidopsis. Our proposed method's code can be accessed at: https://github.com/Candyperfect/Master.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available