☆ 4.6 Article

Filtering big data from social media - Building an early warning system for adverse drug reactions

JOURNAL OF BIOMEDICAL INFORMATICS (2015)

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

卷 54, 期 -, 页码 230-240

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2015.01.011

关键词

Partially supervised classification; Latent Dirichlet Allocation (LDA); Adverse drug reactions; Social media filtering; Social media mining

类别

Computer Science, Interdisciplinary Applications Medical Informatics

资金

Natural Science Foundation of China [71301172, 71171186, 71301175, 61272389]
Social Science Foundation of China [13AXW010]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Objectives: Adverse drug reactions (ADRs) are believed to be a leading cause of death in the world. Pharmacovigilance systems are aimed at early detection of ADRs. With the popularity of social media, Web forums and discussion boards become important sources of data for consumers to shake their drug use experience, as a result may provide useful information on drugs and their adverse reactions. In this study, we propose an automated ADR related posts filtering mechanism using text classification methods. In real-life settings, ADR related messages are highly distributed in social media, while non-ADR related messages are unspecific and topically diverse. It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. To mitigate this challenge, we examine the use of a partially supervised learning classification method to automate the process. Methods: We propose a novel pharmacovigilance system leveraging a Latent Dirichlet Allocation modeling module and a partially supervised classification approach. We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. Various classifiers were trained by varying the number of positive examples and the number of topics. The trained classifiers were applied to 3000 posts published over 60 days. Top-ranked posts from each classifier were pooled and the resulting set of 300 posts was reviewed by a domain expert to evaluate the classifiers. Results: Compare to the alternative approaches Using supervised learning methods and three general purpose partially supervised learning methods, our approach performs significantly better in terms of precision, recall, and the F measure (the harmonic mean of precision and recall), based on a computational experiment using online discussion threads from Medhelp. Conclusions: Our design provides satisfactory performance in identifying ADR related posts for post-marketing drug surveillance. The overall design of our system also points out a potentially fruitful direction for building other early warning systems that need to filter big data from social media networks. (C) 2015 Elsevier Inc. All rights reserved.

Filtering big data from social media - Building an early warning system for adverse drug reactions

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Filtering big data from social media - Building an early warning system for adverse drug reactions

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文