Journal
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
Volume 17, Issue 4, Pages 375-382Publisher
OXFORD UNIV PRESS
DOI: 10.1136/jamia.2009.001412
Keywords
-
Categories
Funding
- VA Cooperative Studies Program
- Veterans Affairs Health Services Research and Development
- Consortium for Health Informatics Research (CHIR) [HIR 09-007]
Ask authors/readers for more resources
Reducing custom software development effort is an important goal in information retrieval (IR). This study evaluated a generalizable approach involving with no custom software or rules development. The study used documents consistent with cancer to evaluate system performance in the domains of colorectal (CRC), prostate (PC), and lung (LC) cancer. Using an end-user-supplied reference set, the automated retrieval console (ARC) iteratively calculated performance of combinations of natural language processing-derived features and supervised classification algorithms. Training and testing involved 10-fold cross-validation for three sets of 500 documents each. Performance metrics included recall, precision, and F-measure. Annotation time for five physicians was also measured. Top performing algorithms had recall, precision, and F-measure values as follows: for CRC, 0.90, 0.92, and 0.89, respectively; for PC, 0.97, 0.95, and 0.94; and for LC, 0.76, 0.80, and 0.75. In all but one case, conditional random fields outperformed maximum entropy-based classifiers. Algorithms had good performance without custom code or rules development, but performance varied by specific application.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available