☆ 4.6 Article

ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers

MOLECULES (2018)

期刊

MOLECULES

卷 23, 期 5, 页码 -

出版社

MDPI

DOI: 10.3390/molecules23051028

关键词

biomedical text mining; big data; Tianhe-2; parallel computing; load balancing

类别

Biochemistry & Molecular Biology Chemistry, Multidisciplinary

资金

National Key R&D Program of China [2018YFB1003203]
National Natural Science Foundation of China [31501073, 61672528, 61773392]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.

ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers

期刊

MOLECULES

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers

期刊

MOLECULES

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文