☆ 4.6 Article

Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes

PLOS ONE (2015)

期刊

PLOS ONE

卷 10, 期 1, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0116040

关键词

类别

Multidisciplinary Sciences

资金

European Commission through the Marie Curie International Incoming Fellowship (IIF) programme (Project: Phenominer) [301806]
Wellcome Trust [098051]
National Institutes of Health [1 U54 HG006370-01]
Australian Research Council (ARC) Discovery Early Career Researcher Award (DECRA) [DE120100508]
Western Australian Government Department of Health, IRDiRC
RD-Connect-European Union [305444]
Australian National Health
Medical Research Council under the NHMRC-European Union Collaborative Research Grants scheme [APP1055319]
Education Investment Fund

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.

Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文