☆ 4.1 Article

12 years on - Is the NLM medical text indexer still useful and relevant?

JOURNAL OF BIOMEDICAL SEMANTICS (2017)

Journal

JOURNAL OF BIOMEDICAL SEMANTICS

Volume 8, Issue -, Pages -

Publisher

BIOMED CENTRAL LTD

DOI: 10.1186/s13326-017-0113-5

Keywords

Indexing methods; Text categorization; MeSH; MEDLINE; Machine learning; BioASQ

Funding

Intramural Research Program of the National Institutes of Health
National Library of Medicine

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Background: Facing a growing workload and dwindling resources, the US National Library of Medicine (NLM) created the Indexing Initiative project in 1996. This cross-library team's mission is to explore indexing methodologies for ensuring quality and currency of NLM document collections. The NLM Medical Text Indexer (MTI) is the main product of this project and has been providing automated indexing recommendations since 2002. After all of this time, the questions arise whether MTI is still useful and relevant. Methods: To answer the question about MTI usefulness, we track a wide variety of statistics related to how frequently MEDLINE indexers refer to MTI recommendations, how well MTI performs against human indexing, and how often MTI is used. To answer the question of MTI relevancy compared to other available tools, we have participated in the 2013 and 2014 BioASQ Challenges. The BioASQ Challenges have provided us with an unbiased comparison between the MTI system and other systems performing the same task. Results: Indexers have continually increased their use of MTI recommendations over the years from 15.75% of the articles they index in 2002 to 62.44% in 2014 showing that the indexers find MTI to be increasingly useful. The MTI performance statistics show significant improvement in Precision (+0.2992) and F-1 (+0.1997) with modest gains in Recall (+0.0454) over the years. MTI consistency is comparable to the available indexer consistency studies. MTI performed well in both of the BioASQ Challenges ranking within the top tier teams. Conclusions: Based on our findings, yes, MTI is still relevant and useful, and needs to be improved and expanded. The BioASQ Challenge results have shown that we need to incorporate more machine learning into MTI while still retaining the indexing rules that have earned MTI the indexers' trust over the years. We also need to expand MTI through the use of full text, when and where it is available, to provide coverage of indexing terms that are typically only found in the full text. The role of MTI at NLM is also expanding into new areas, further reinforcing the idea that MTI is increasingly useful and relevant.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Improving Large-Scale k-Nearest Neighbor Text Categorization with Label Autoencoders

Francisco J. Ribadas-Pena, Shuyuan Cao, Victor M. Darriba Bilbao

Summary: In this paper, a multi-label lazy learning approach is proposed for automatic semantic indexing in large document collections with complex label vocabularies and high inter-label correlation. The method is evaluated on a portion of the MEDLINE biomedical document collection, using different document representation approaches and label autoencoder configurations.

MATHEMATICS (2022)