期刊
AMERICAN JOURNAL OF THERAPEUTICS
卷 23, 期 3, 页码 e837-e843出版社
LIPPINCOTT WILLIAMS & WILKINS
DOI: 10.1097/MJT.0b013e31827ab4a0
关键词
clinical research; machine learning; unsupervised data mining; BIRCH clustering; two step clustering; outlier detection; anomaly detection
With large data files, outlier recognition requires a more sophisticated approach than the traditional data plots and regression lines. In addition, the number of outliers tends to rise linearly with the data's sample size. The objective of this study was to examine whether balanced iterative reducing and clustering using hierarchies (BIRCH) clustering is able to detect previously unrecognized outlier data.A simulated and a real data files were used as examples. SPSS statistical software was used for data analysis. In 50 mentally depressed persons, a regression analysis failed to detect any outliers. BIRCH analysis of these data showed in addition to 2 clusters a relevant outlier cluster consistent of 7 patients (14%) not fitting in the formed clusters. In 576 iatrogenic admissions, the number of comedications was not a significant loglinear predictor of the iatrogenic admission. In contrast, BIRCH analysis revealed an outlier cluster consistent of 174 patients (30%) with extremely many comedications. The conclusions were as follows: (1) A systematic assessment for outliers is important in therapeutic research with large data, because the lack of it can lead to catastrophic consequences. (2) Traditional data analysis, such as regression analysis, was unable to demonstrate outliers in our examples. (3) BIRCH cluster analysis of the examples produced relevant outlier clusters of patients not fitting in the data otherwise. (4) On theoretical grounds, BIRCH cluster analysis is, particularly, suitable for large datasets.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据