☆ 4.3 Article

Machine Learning in Therapeutic Research: The Hard Work of Outlier Detection in Large Data

AMERICAN JOURNAL OF THERAPEUTICS (2016)

期刊

AMERICAN JOURNAL OF THERAPEUTICS

卷 23, 期 3, 页码 e837-e843

出版社

LIPPINCOTT WILLIAMS & WILKINS

DOI: 10.1097/MJT.0b013e31827ab4a0

关键词

clinical research; machine learning; unsupervised data mining; BIRCH clustering; two step clustering; outlier detection; anomaly detection

类别

Pharmacology & Pharmacy

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

With large data files, outlier recognition requires a more sophisticated approach than the traditional data plots and regression lines. In addition, the number of outliers tends to rise linearly with the data's sample size. The objective of this study was to examine whether balanced iterative reducing and clustering using hierarchies (BIRCH) clustering is able to detect previously unrecognized outlier data.A simulated and a real data files were used as examples. SPSS statistical software was used for data analysis. In 50 mentally depressed persons, a regression analysis failed to detect any outliers. BIRCH analysis of these data showed in addition to 2 clusters a relevant outlier cluster consistent of 7 patients (14%) not fitting in the formed clusters. In 576 iatrogenic admissions, the number of comedications was not a significant loglinear predictor of the iatrogenic admission. In contrast, BIRCH analysis revealed an outlier cluster consistent of 174 patients (30%) with extremely many comedications. The conclusions were as follows: (1) A systematic assessment for outliers is important in therapeutic research with large data, because the lack of it can lead to catastrophic consequences. (2) Traditional data analysis, such as regression analysis, was unable to demonstrate outliers in our examples. (3) BIRCH cluster analysis of the examples produced relevant outlier clusters of patients not fitting in the data otherwise. (4) On theoretical grounds, BIRCH cluster analysis is, particularly, suitable for large datasets.

Machine Learning in Therapeutic Research: The Hard Work of Outlier Detection in Large Data

期刊

AMERICAN JOURNAL OF THERAPEUTICS

出版社

LIPPINCOTT WILLIAMS & WILKINS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Machine Learning in Therapeutic Research: The Hard Work of Outlier Detection in Large Data

期刊

AMERICAN JOURNAL OF THERAPEUTICS

出版社

LIPPINCOTT WILLIAMS & WILKINS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文