4.6 Article

Improving textual medication extraction using combined conditional random fields and rule-based systems

出版社

OXFORD UNIV PRESS
DOI: 10.1136/jamia.2010.004119

关键词

-

资金

  1. Humboldt Foundation
  2. National Library of Medicine [U54LM008748]

向作者/读者索取更多资源

Objective In the i2b2 Medication Extraction Challenge, medication names together with details of their administration were to be extracted from medical discharge summaries. Design The task of the challenge was decomposed into three pipelined components: named entity identification, context-aware filtering and relation extraction. For named entity identification, first a rule-based (FIB) method that was used in our overall fifth place-ranked solution at the challenge was investigated. Second, a conditional random fields (CRF) approach is presented for named entity identification (NEI) developed after the completion of the challenge. The CRF models are trained on the 17 ground truth documents, the output of the rule-based NEI component on all documents, a larger but potentially inaccurate training dataset. For both NEI approaches their effect on relation extraction performance was investigated. The filtering and relation extraction components are both rule-based. Measurements In addition to the official entry level evaluation of the challenge, entity level analysis is also provided. Results On the test data an entry level F-1-score of 80% was achieved for exact matching and 81% for inexact matching with the RB-NEI component. The CAF produces a significantly weaker result, but CRF outperforms the rule-based model with 81% exact and 82% inexact F-1-score (p<0.02). Conclusion This study shows that a simple rule-based method is on a par with more complicated machine learners; CAF models can benefit from the addition of the potentially inaccurate training data, when only very few training documents are available. Such training data could be generated using the outputs of rule-based methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据