4.3 Article

Machine Learning in Therapeutic Research: The Hard Work of Outlier Detection in Large Data

期刊

AMERICAN JOURNAL OF THERAPEUTICS
卷 23, 期 3, 页码 e837-e843

出版社

LIPPINCOTT WILLIAMS & WILKINS
DOI: 10.1097/MJT.0b013e31827ab4a0

关键词

clinical research; machine learning; unsupervised data mining; BIRCH clustering; two step clustering; outlier detection; anomaly detection

向作者/读者索取更多资源

With large data files, outlier recognition requires a more sophisticated approach than the traditional data plots and regression lines. In addition, the number of outliers tends to rise linearly with the data's sample size. The objective of this study was to examine whether balanced iterative reducing and clustering using hierarchies (BIRCH) clustering is able to detect previously unrecognized outlier data.A simulated and a real data files were used as examples. SPSS statistical software was used for data analysis. In 50 mentally depressed persons, a regression analysis failed to detect any outliers. BIRCH analysis of these data showed in addition to 2 clusters a relevant outlier cluster consistent of 7 patients (14%) not fitting in the formed clusters. In 576 iatrogenic admissions, the number of comedications was not a significant loglinear predictor of the iatrogenic admission. In contrast, BIRCH analysis revealed an outlier cluster consistent of 174 patients (30%) with extremely many comedications. The conclusions were as follows: (1) A systematic assessment for outliers is important in therapeutic research with large data, because the lack of it can lead to catastrophic consequences. (2) Traditional data analysis, such as regression analysis, was unable to demonstrate outliers in our examples. (3) BIRCH cluster analysis of the examples produced relevant outlier clusters of patients not fitting in the data otherwise. (4) On theoretical grounds, BIRCH cluster analysis is, particularly, suitable for large datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据