期刊
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
卷 19, 期 5, 页码 800-808出版社
OXFORD UNIV PRESS
DOI: 10.1136/amiajnl-2011-000775
关键词
-
类别
资金
- National Library of Medicine [1R01GM095476]
Objective Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text. Materials and Methods Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (similar to 112 000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (similar to 1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores. Results and Conclusion Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据