☆ 4.7 Article

Disambiguating the species of biomedical named entities using natural language parsers

BIOINFORMATICS (2010)

Journal

BIOINFORMATICS

Volume 26, Issue 5, Pages 661-667

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btq002

Keywords

Funding

Pfizer Ltd.
Joint Information Systems Committee
BBSRC [BB/G013160/1] Funding Source: UKRI
Biotechnology and Biological Sciences Research Council [BB/G013160/1] Funding Source: researchfish

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Text mining technologies have been shown to reduce the laborious work involved in organizing the vast amount of information hidden in the literature. One challenge in text mining is linking ambiguous word forms to unambiguous biological concepts. This article reports on a comprehensive study on resolving the ambiguity in mentions of biomedical named entities with respect to model organisms and presents an array of approaches, with focus on methods utilizing natural language parsers. Results: We build a corpus for organism disambiguation where every occurrence of protein/gene entity is manually tagged with a species ID, and evaluate a number of methods on it. Promising results are obtained by training a machine learning model on syntactic parse trees, which is then used to decide whether an entity belongs to the model organism denoted by a neighbouring species-indicating word (e.g. yeast). The parser-based approaches are also compared with a supervised classification method and results indicate that the former are a more favorable choice when domain portability is of concern. The best overall performance is obtained by combining the strengths of syntactic features and supervised classification.

Disambiguating the species of biomedical named entities using natural language parsers

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Disambiguating the species of biomedical named entities using natural language parsers

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper