4.6 Article

DISNET: a framework for extracting phenotypic disease information from public sources

期刊

PEERJ
卷 8, 期 -, 页码 -

出版社

PEERJ INC
DOI: 10.7717/peerj.8580

关键词

Disnet framework; Natural language processing; Phenotypic information; Public sources; Disease understanding

资金

  1. Spanish Ministerio de Ciencia, Innovacion y Universidades [RTI2018-094576-A-I00]
  2. Mexican Consejo Nacional de Ciencia y Tecnologia (CONACYT) under the programme 291114 -BECAS CONACYT AL EXTRANJERO'' [CVU: 340523]
  3. Programa de fomento de la investigacion y la innovacion (Doctorados Industriales'') from Comunidad de Madrid [IND2019/TIC-17159]

向作者/读者索取更多资源

Background. Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. Methods. We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. Results. We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output indudes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system's API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. Discussion. DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system's reliability.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据