☆ 4.5 Article

BioC: a minimalist approach to interoperability for biomedical text processing

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2013)

期刊

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

卷 -, 期 -, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/database/bat064

关键词

类别

Mathematical & Computational Biology

资金

Intramural Research Program of the National Institutes of Health, National Library of Medicine
National Library of Medicine [G08LM010720]
National Institutes of Health [NIH 5R01 LM009254-07, NIH 5R01 LM008111-08]
National Science Foundation [DBI-1062520]
Swiss National Science Foundation [105315_130558/1]
Swiss National Science Foundation (SNF) [105315_130558] Funding Source: Swiss National Science Foundation (SNF)
Direct For Biological Sciences
Div Of Biological Infrastructure [0850319] Funding Source: National Science Foundation
Div Of Biological Infrastructure
Direct For Biological Sciences [1062520] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions.

BioC: a minimalist approach to interoperability for biomedical text processing

期刊

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

BioC: a minimalist approach to interoperability for biomedical text processing

期刊

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文