☆ 4.5 Article

Model-based clustering of high-dimensional data: Variable selection versus facet determination

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING (2013)

Journal

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

Volume 54, Issue 1, Pages 196-215

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ijar.2012.08.001

Keywords

Model-based clustering; Facet determination; Variable selection; Latent tree models; Gaussian mixture models

Funding

National Basic Research Program of China (aka the 973 Program) [2011CB505101]
HKUST Fok Ying Tung Graduate School

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the best clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection. (C) 2012 Elsevier Inc. All rights reserved.

Model-based clustering of high-dimensional data: Variable selection versus facet determination

Journal

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Model-based clustering of high-dimensional data: Variable selection versus facet determination

Journal

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper