☆ 4.7 Article

A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2009)

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

卷 21, 期 12, 页码 1649-1664

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2009.42

关键词

Discrete data; finite mixture models; multinomial; Dirichlet prior; feature weighting/selection; MAP; stochastic complexity; Fisher kernel; image databases; text clustering

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic

资金

Natural Sciences and Engineering Research Council of Canada (NSERC)
NATEQ (Le Fonds Quebecois de la Recherche sur la Nature et les Technologies)
Nouveaux Chercheurs Grant
Concordia University

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we consider the problem of unsupervised discrete feature selection/weighting. Indeed, discrete data are an important component in many data mining, machine learning, image processing, and computer vision applications. However, much of the published work on unsupervised feature selection has concentrated on continuous data. We propose a probabilistic approach that assigns relevance weights to discrete features that are considered as random variables modeled by finite discrete mixtures. The choice of finite mixture models is justified by its flexibility which has led to its widespread application in different domains. For the learning of the model, we consider both Bayesian and information-theoretic approaches through stochastic complexity. Experimental results are presented to illustrate the feasibility and merits of our approach on a difficult problem which is clustering and recognizing visual concepts in different image data. The proposed approach is successfully applied also for text clustering.

A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文