期刊
KNOWLEDGE AND INFORMATION SYSTEMS
卷 60, 期 3, 页码 1453-1478出版社
SPRINGER LONDON LTD
DOI: 10.1007/s10115-018-1265-z
关键词
Named entity recognition; Feature selection; Binary PSO; Correlation; Mutual information; Normalized mutual information; Particle swarm optimization
Named entity recognition is a vital task for various applications related to biomedical natural language processing. It aims at extracting different biomedical entities from the text and classifying them into some predefined categories. The types could vary depending upon the genre and domain, such as gene versus non-gene in a coarse-grained scenario, or protein, DNA, RNA, cell line, and cell-type in a fine-grained scenario. In this paper, we present a novel filter-based feature selection technique utilizing the search capability of particle swarm optimization (PSO) for determining the most optimal feature combination. The technique yields in the most optimized feature set, that when used for classifiers learning, enhance the system performance. The proposed approach is assessed over four popular biomedical corpora, namely GENIA, GENETAG, AIMed, and Biocreative-II Gene Mention Recognition (BC-II). Our proposed model obtains the F score values of 74.49%, 91.11%, 90.47%, 88.64% on GENIA, GENETAG, AIMed, and BC-II dataset, respectively. The efficiency of feature pruning through PSO is evident with significant performance gains, even with amuch reduced set of features.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据