4.6 Review

On the number of components in a Gaussian mixture model

出版社

WILEY PERIODICALS, INC
DOI: 10.1002/widm.1135

关键词

-

向作者/读者索取更多资源

Mixture distributions, in particular normal mixtures, are applied to data with two main purposes in mind. One is to provide an appealing semiparametric framework in which to model unknown distributional shapes, as an alternative to, say, the kernel density method. The other is to use the mixture model to provide a probabilistic clustering of the data into g clusters corresponding to the g components in the mixture model. In both situations, there is the question of how many components to include in the normal mixture model. We review various methods that have been proposed to answer this question. WIREs Data Mining Knowl Discov 2014, 4:341-355. doi: 10.1002/widm.1135 For further resources related to this article, please visit the . Conflict of interest: The authors have declared no conflicts of interest for this article.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Statistics & Probability

Skew-normal generalized spatial panel data model

Mohadeseh Alsadat Farzammehr, Mohammad Reza Zadkarami, Geoffrey J. McLachlan

Summary: Traditional spatial panel data models typically assume a normal distribution for random error components, which may not be appropriate in many applications. A more flexible approach, the skew-normal generalized spatial panel data model, is proposed here, using a multivariate skew normal distribution for random error components. A Bayesian inference algorithm is developed for parameter estimation, and comparison with the traditional (normal) spatial model is conducted through simulation and analysis of real data on cigarette demand.

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION (2021)

Article Computer Science, Theory & Methods

Mini-batch learning of exponential family finite mixture models

Hien D. Nguyen, Florence Forbes, Geoffrey J. McLachlan

STATISTICS AND COMPUTING (2020)

Article Statistics & Probability

Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions

Sharon X. Lee, Tsung- Lin, Geoffrey J. McLachlan

Summary: Mixtures of factor analyzers (MFA) are a powerful tool for modeling high-dimensional datasets, with recent generalizations allowing for skewness in the data. The proposed new model based on scale mixtures of canonical fundamental skew normal distributions can capture various types of skewness and asymmetry, accommodating multiple directions of skewness. Parameter estimation for this model can be carried out using maximum likelihood via an EM-type algorithm, and its usefulness and potential have been demonstrated using four real datasets.

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2021)

Article Statistics & Probability

On formulations of skew factor models: Skew factors and/or skew errors

Sharon X. Lee, Geoffrey J. McLachlan

Summary: In recent years, several mixtures of skew factor analyzers have been proposed with various skew distributions for either the factors or the errors. This paper examines the connections between these formulations and introduces a unified model that allows for skewness in both the factors and errors.

STATISTICS & PROBABILITY LETTERS (2021)

Article Statistics & Probability

Bayesian analysis of generalized linear mixed models with spatial correlated and unrestricted skew normal errors

Mohadeseh Alsadat Farzammehr, Mohsen Mohammadzadeh, Mohammad Reza Zadkarami, Geoffrey J. McLachlan

Summary: This research relaxes the normality assumption of a generalized linear mixed model by using an unrestricted multivariate skew-normal distribution. Parameter estimation is done through a Bayesian inference algorithm, and the proposed skew normal spatial mixed model is compared with the normal spatial mixed model through simulation studies and analysis of real data.

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS (2022)

Article Computer Science, Interdisciplinary Applications

Harmless label noise and informative soft-labels in supervised classification

Daniel Ahfock, Geoffrey J. McLachlan

Summary: In supervised learning, manual labelling of training examples often introduces label noise, but logistic regression can be more robust to label errors when label noise is positively correlated with classification difficulty, improving classification accuracy.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2021)

Article Computer Science, Theory & Methods

Data fusion using factor analysis and low-rank matrix completion

Daniel Ahfock, Saumyadipta Pyne, Geoffrey J. McLachlan

Summary: Data fusion involves integrating multiple related datasets. The statistical file-matching problem is a classic problem in multivariate analysis, and factor analysis models' low-rank structure can be used to estimate the full covariance matrix, providing better performance for file-matching problems.

STATISTICS AND COMPUTING (2021)

Article Computer Science, Artificial Intelligence

Multi-node Expectation-Maximization algorithm for finite mixture models

Sharon X. Lee, Geoffrey J. McLachlan, Kaleb L. Leemaqz

Summary: Finite mixture models are powerful tools for modeling and analyzing heterogeneous data, and recent trends show a shift towards using more flexible distributions. This paper presents a parallel implementation of the EM algorithm for these models, suitable for various processors and systems, with numerical experiments and comparisons across different platforms.

STATISTICAL ANALYSIS AND DATA MINING (2021)

Article Statistics & Probability

Robust clustering based on finite mixture of multivariate fragmental distributions

Mohsen Maleki, Geoffrey J. McLachlan, Sharon X. Lee

Summary: This paper introduces a flexible class of multivariate distributions called scale mixtures of fragmental normal (SMFN) distributions. It proposes an extension to the case of a finite mixture of SMFN (FM-SMFN) distributions. The SMFN family of distributions is convenient and effective for modeling data with skewness, discrepant observations, and population heterogeneity. It also possesses other desirable properties, such as an analytically tractable density and ease of computation for simulation and estimation of parameters.

STATISTICAL MODELLING (2023)

Article Statistics & Probability

Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces

TrungTin Nguyen, Faicel Chamroukhi, Hien D. Nguyen, Geoffrey J. McLachlan

Summary: The class of location-scale finite mixtures is of enduring interest in both applied and theoretical probability and statistics. The paper establishes and proves the following results: (a) location-scale mixtures of a continuous probability density function (PDF) can uniformly approximate any continuous PDF on a compact set with arbitrary accuracy; and (b) for any finite p >= 1, location-scale mixtures of an essentially bounded PDF can approximate any PDF in the L-p norm.

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS (2023)

Article Computer Science, Interdisciplinary Applications

Statistical file-matching of non-Gaussian data: A game theoretic approach

Daniel Ahfock, Saumyadipta Pyne, Geoffrey J. McLachlan

Summary: The statistical file-matching problem involves data integration with structured missing data, where imputation methods can be nonparametric or parametric. Game theory is used to study the identification problem and establish a general characterization of the minimax optimal strategy. Comparisons show that using the minimax optimal strategy for imputation can better preserve the joint distribution of variables compared to standard algorithms.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2022)

Article Statistics & Probability

An overview of skew distributions in model-based clustering

Sharon X. Lee, Geoffrey J. McLachlan

Summary: The literature on non-normal model-based clustering has been expanding in recent years. These models often use a mixture of component densities to provide flexibility in distributional shapes and handle skewness. Skewing is typically achieved by introducing latent variables or considering marginal transformations of the original variables.

JOURNAL OF MULTIVARIATE ANALYSIS (2022)

Article Statistics & Probability

A spatial heterogeneity mixed model with skew-elliptical distributions

Mohadeseh Alsadat Farzammehr, Geoffrey J. McLachlan

Summary: The distribution of observations in most econometric studies with spatial heterogeneity is skewed, and the normality assumption is not always appropriate. This study relaxes the normality assumption in spatial mixed models and allows for spatial heterogeneity. Bayesian mixed modeling with a multivariate skew-elliptical distribution is used for inference, and the proposed model is shown to be superior to conventional ones based on a simulation study and empirical evidence.

COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS (2022)

Article Statistics & Probability

Order selection with confidence for finite mixture models

Hien D. Nguyen, Daniel Fryer, Geoffrey J. McLachlan

Summary: The study addresses the problem of determining the number of mixture components in finite mixture models, proposing a method based on a sequential testing procedure. Through simulation studies and real data examples, the performance of the proposed method is demonstrated, providing practical recommendations for its application.

JOURNAL OF THE KOREAN STATISTICAL SOCIETY (2022)

暂无数据