4.5 Article

Algorithms for additive clustering of rectangular data tables

Journal

COMPUTATIONAL STATISTICS & DATA ANALYSIS
Volume 52, Issue 11, Pages 4923-4938

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.csda.2008.04.014

Keywords

-

Ask authors/readers for more resources

The overlapping additive clustering model or principal cluster model is a model for two-way two-mode object by variable data that implies an overlapping clustering of the objects and a set of profiles (characteristic variable values for each cluster). The model values of the variables of an object are the sum of the profiles of its corresponding clusters. In the associated data analysis the data matrix at hand is approximated by an overlapping additive clustering model of a prespecified rank by minimizing a least squares loss function. Recently an algorithm has been proposed for this purpose. This algorithm is a sequential fitting strategy, also called the method of principal clusters (PCL). Theoretical and empirical evidence that the PCL algorithm may have problems in revealing the true structure underlying a data set will be presented. As a way out, three new algorithms to fit the principal cluster model to empirical data will be presented: two of an alternating least squares (ALS) type, orthogonally combined with two different starting strategies, and one based on simulated annealing (SA). In a simulation study it is demonstrated that all three new algorithms outperform the existing PCL algorithm. The amount of objects that belong to more than one cluster (the overlap) is further found to have a considerable influence on the algorithmic performance of the ALS algorithms, with low amounts of overlap requiring a different starting strategy than high ones. As a consequence, for the analysis of real data sets in practice, a hybrid approach will be presented consisting of one of the ALS algorithms initialized by means of the two starting strategies under study. (C) 2008 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available