☆ 4.4 Article Proceedings Paper

Automatic gene annotation using GO terms from cellular component domain

BMC MEDICAL INFORMATICS AND DECISION MAKING (2018)

Journal

BMC MEDICAL INFORMATICS AND DECISION MAKING

Volume 18, Issue -, Pages -

Publisher

BMC

DOI: 10.1186/s12911-018-0694-7

Keywords

Natural language processing; Gene ontology annotation; Relation extraction

Funding

National Key R&D Program of China [2016YFF0204205, 2018YFF0213901]
China National Institute of Standardization [522016Y-4681, 522018Y-5948, 522018Y-5941]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

BackgroundThe Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process which assigns gene functional information using GO terms to relevant genes in the literature. It is a common task among the Model Organism Database (MOD) groups. Manual GO annotation relies on human curators assigning gene functional information using GO terms by reading the biomedical literature. This process is very time-consuming and labor-intensive. As a result, many MODs can afford to curate only a fraction of relevant articles.MethodsGO terms from the CC domain can be essentially divided into two sub-hierarchies: subcellular location terms, and protein complex terms. We cast the task of gene annotation using GO terms from the CC domain as relation extraction between gene and other entities: (1) extract cases where a protein is found to be in a subcellular location, and (2) extract cases where a protein is a subunit of a protein complex. For each relation extraction task, we use an approach based on triggers and syntactic dependencies to extract the desired relations among entities.ResultsWe tested our approach on the BC4GO test set, a publicly available corpus for GO annotation. Our approach obtains a F1-score of 71%, a precision of 91% and a recall of 58% for predicting GO terms from CC Domain for given genes.ConclusionsWe have described a novel approach of treating gene annotation with GO terms from CC domain as two relation extraction subtasks. Evaluation results show that our approach achieves a F1-score of 71% for predicting GO terms for given genes. Thereby our approach can be used to accelerate the process of GO annotation for the bio-annotators.

Automatic gene annotation using GO terms from cellular component domain

Journal

BMC MEDICAL INFORMATICS AND DECISION MAKING

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Automatic gene annotation using GO terms from cellular component domain

Journal

BMC MEDICAL INFORMATICS AND DECISION MAKING

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper