☆ 4.4 Article

CLUSTER CORRESPONDENCE ANALYSIS

PSYCHOMETRIKA (2017)

期刊

PSYCHOMETRIKA

卷 82, 期 1, 页码 158-185

出版社

SPRINGER

DOI: 10.1007/s11336-016-9514-0

关键词

correspondence analysis; cluster analysis; dimension reduction; categorical data

类别

Mathematics, Interdisciplinary Applications Social Sciences, Mathematical Methods Psychology, Mathematical

向作者/读者索取更多资源

Protocol

Reagent

摘要

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Mathematics

Categorical Functional Data Analysis. The cfda R Package

Cristian Preda, Quentin Grimonprez, Vincent Vandewalle

Summary: This paper discusses categorical functional data represented by paths of a stochastic jump process and extends the concept of multiple correspondence analysis. By approximating the optimal encoding of states over time, it achieves dimension reduction, optimal representation, and visualization of data. The methodology is implemented in the cfda R package and demonstrated using a real data set in the clustering framework.

MATHEMATICS (2021)

添加到收藏夹

Article Environmental Sciences

Spatiotemporal Modes of Short Time Rainstorms Based on High-Dimensional Data: A Case Study of the Urban Area of Beijing, China

Wei Liu, Sheng Chen, Fuchang Tian

Summary: This study presents an approach to define typical modes of rainfall in urban areas from the temporal and spatial dimensions, and the analysis of monitoring data from multiple rainfall stations in Beijing reveals three modes of rainstorms.

WATER (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Dimension reduction of high-dimension categorical data with two or multiple responses considering interactions between responses

Yuehan Yang

Summary: This paper focuses on modeling categorical data with two or multiple responses and proposes an efficient iterative procedure based on sufficient dimension reduction to consider the interactions between the responses. Theoretical guarantees are provided under the two-and multiple-response models, and the uniqueness of the proposed estimator is demonstrated. The proposed method is efficient in the multiple-response model and outperforms existing methods in the same models, as demonstrated through application to adult and right heart catheterization datasets.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

添加到收藏夹

Article Engineering, Civil

Using t-distributed Stochastic Neighbor Embedding (t-SNE) for cluster analysis and spatial zone delineation of groundwater geochemistry data

Honghua Liu, Jing Yang, Ming Ye, Scott C. James, Zhonghua Tang, Jie Dong, Tongju Xing

Summary: This study introduced t-SNE as a graphic approach to assist cluster analysis for groundwater geochemistry data. Compared to PCA, t-SNE performed better in assisting cluster analysis, showing promise as a tool for determining cluster numbers and delineating spatial zones.

JOURNAL OF HYDROLOGY (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A categorical data clustering framework on graph representation

Liang Bai, Jiye Liang

Summary: This paper introduces a graph-based framework for clustering categorical data. The proposed method learns the representation of categorical values from their similar graph to provide similar representations for similar categorical values. Experimental results demonstrate the effectiveness of the framework compared to other methods.

PATTERN RECOGNITION (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Dimensionality Reduction for Categorical Data

Debajyoti Bera, Rameshwar Pratap, Bhisham Dev Verma

Summary: This work focuses on compressing vectors over categorical attributes to low-dimension discrete vectors. Existing hash-based methods lack guarantees on the Hamming distances between compressed representations. FSketch is introduced to create sketches for sparse categorical data and estimate pairwise Hamming distances from the sketches. These sketches can be used instead of original data in data mining tasks without compromising the quality, thanks to their categorical, sparse nature and reasonably precise Hamming distance estimates. The single-pass algorithm is efficient and applicable to various real-life scenarios, as demonstrated by theoretical analysis and comparative evaluations on real-world datasets.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

添加到收藏夹

Article Statistics & Probability

Visualizing Class Specific Heterogeneous Tendencies in Categorical Data

Mariko Takagishi, Michel van de Velden

Summary: In multiple correspondence analysis, a biplot can be used to depict the relationships between categories and individuals. Additional information about individuals can enhance interpretation capacities, such as including class information to facilitate the interpretation of relationships between individuals and categories.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2022)

添加到收藏夹

Article Genetics & Heredity

Principal Amalgamation Analysis for Microbiome Data

Yan Li, Gen Li, Kun Chen

Summary: Microbiome studies have recently gained popularity, but due to sparse and high dimensional data, dimension reduction is often necessary. We propose a novel dimension reduction method for microbiome data called Principal Amalgamation Analysis (PAA), which aggregates compositions into principal compositions guided by the available taxonomic structure. We develop visualization tools for data visualization and demonstrate the effectiveness of PAA using real microbiome data.

GENES (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

Crimp: An efficient tool for summarizing multiple clusterings in population structure analysis and beyond

Ulrich Lautenschlager

Summary: This article introduces a lightweight command-line tool called Crimp, which solves the label-switching problem. It aligns clusterings with the same number of clusters, making it easier to compare and generate an averaged clustering. Benchmark analyses show that Crimp performs well in terms of runtime requirements and solution quality, especially for larger data sets.

MOLECULAR ECOLOGY RESOURCES (2023)

添加到收藏夹

Article Engineering, Industrial

Efficient and interpretable monitoring of high-dimensional categorical processes

Kai Wang, Jian Li, Fugee Tsung

Summary: This study proposes an efficient and interpretable probabilistic tensor decomposition model for monitoring high-dimensional categorical data. By decomposing a huge tensor into latent classes, the number of model parameters is dramatically reduced. To improve interpretability, a novel polarization regularization method is used. Extensive simulations and a real case study validate the superior inference and monitoring performance of the proposed method.

IISE TRANSACTIONS (2023)

添加到收藏夹

Article Business, Finance

The role of categorical EPU indices in predicting stock-market returns

Juan Chen, Feng Ma, Xuemei Qiu, Tao Li

Summary: This study examines the predictive ability of categorical economic-policy uncertainty (EPU) indices for stock-market returns. The findings suggest that certain categorical EPU indices outperform the original EPU index and popular predictors in predicting stock returns, achieving higher realized utility. Moreover, diffusion indices based on EPU categories, particularly those utilizing partial least squares (PLS) to extract principal components, effectively utilize forecast information from categorical EPU indices, resulting in improved forecast performance, reduced errors, and increased economic value for investors. Additionally, categorical EPU indices demonstrate superior forecasting performance during economic expansions, the China-US trade war, and the COVID-19 pandemic.

INTERNATIONAL REVIEW OF ECONOMICS & FINANCE (2023)

添加到收藏夹

Article Geography

Multivariate Neighborhood Trajectory Analysis: An Exploration of the Functional Data Analysis Approach

Paul H. Jung, Jun Song

Summary: Recent neighborhood studies have focused on longitudinal aspects of neighborhood change and data-mining methodologies. Using functional data analysis method, neighborhood trajectory clusters are identified through mathematically represented multivariate time-dependent curves. The model has been applied to Charlotte and Detroit areas to analyze ongoing racial and socioeconomic segregation patterns and time dynamics of neighborhood change.

GEOGRAPHICAL ANALYSIS (2022)

添加到收藏夹

Article Transportation

Understanding patterns of moped and seated motor scooter (50 cc or less) involved fatal crashes using cluster correspondence analysis

Subasish Das, Md Mahmud Hossain, M. Ashifur Rahman, Xiaoqiang Kong, Xiaoduan Sun, G. M. Al Mamun

Summary: Moped and seated motor scooter riders have a high risk of being involved in crash accidents. A study analyzed fatal crash data and identified critical clusters in order to develop interventions to minimize collisions and fatalities.

TRANSPORTMETRICA A-TRANSPORT SCIENCE (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Efficient binary embedding of categorical data using BinSketch

Bhisham Dev Verma, Rameshwar Pratap, Debajyoti Bera

Summary: This paper presents a dimensionality reduction algorithm for categorical datasets, which constructs low-dimensional binary sketches from high-dimensional categorical vectors and approximates the Hamming distance between any two original vectors. The approach is particularly useful for sparse datasets and has been rigorously analyzed and experimentally validated.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

添加到收藏夹

Article Computer Science, Interdisciplinary Applications

Partial sufficient variable screening with categorical controls

Chenlu Ke, Wei Yang, Qingcong Yuan, Lu Li

Summary: Variable screening is an important tool for dimension reduction in ultrahigh dimensional data analysis. This study proposes a partial sufficient variable screening method for the presence of control variables, which aims to reduce the predictive set without losing regression information. The method achieves variable screening by constraining the reduction of continuous variables using the subpopulations identified by categorical variables. The effectiveness of the method is demonstrated through simulation studies and an application in gene screening for diffuse large-B-cell lymphoma prognosis.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2023)

添加到收藏夹

Article Neurosciences

E-TAN, a technology-enhanced platform with tangible objects for the assessment of visual neglect: A multiple single-case study

Antonio Cerrato, Daniela Pacella, Francesco Palumbo, Diane Beauvais, Michela Ponticorvo, Orazio Miglino, Paolo Bartolomeo

Summary: E-TAN is a technology-enhanced platform for assessing visual neglect patients, which records the location, sequence, and timing of objects through an automatized process, effectively discriminating patients with visual neglect from those without. Patients can use this platform at home to track the effects of rehabilitation.

NEUROPSYCHOLOGICAL REHABILITATION (2021)

添加到收藏夹

Article Statistics & Probability

Partial possibilistic regression path modeling: handling uncertainty in path modeling

Rosaria Romano, Francesco Palumbo

Summary: The paper introduces a new method called partial possibilistic regression path modeling, which combines principles of path modeling and possibilistic regression to model relationships among blocks of variables. The comparison with a classical composite-based path model is based on a simulation study, while a case study on the use of Wikipedia in higher education illustrates the usability context of the proposed method.

COMPUTATIONAL STATISTICS (2021)

添加到收藏夹

Article Statistics & Probability

Chunk-wise regularised PCA-based imputation of missing data

A. Iodice D'Enza, A. Markos, F. Palumbo

Summary: Two chunk-wise implementations of RPCA suitable for tall data sets are proposed in this paper, with one for distributed computation and the other for incremental computation. Experimental results show that the distributed approach performs similarly to batch RPCA for data with completely random missing entries, while the incremental approach shows good performance for data with non-completely random missing entries if the first analyzed chunks contain sufficient information on the data structure.

STATISTICAL METHODS AND APPLICATIONS (2022)

添加到收藏夹

Article Statistics & Probability

Visualizing Class Specific Heterogeneous Tendencies in Categorical Data

Mariko Takagishi, Michel van de Velden

Summary: In multiple correspondence analysis, a biplot can be used to depict the relationships between categories and individuals. Additional information about individuals can enhance interpretation capacities, such as including class information to facilitate the interpretation of relationships between individuals and categories.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2022)

添加到收藏夹

Article Statistics & Probability

Least-squares bilinear clustering of three-way data

Pieter C. Schoonees, Patrick J. F. Groenen, Michel van de Velden

Summary: A least-squares bilinear clustering framework is introduced to model three-way data, combining bilinear decompositions of two-way matrices with clustering over observations. Different clusterings are defined for each part of the bilinear decomposition, reducing computational burden. The method simplifies the joint clustering problem into separate problems that can be handled independently.

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2022)

添加到收藏夹

Review Oncology

Factors promoting breast, cervical and colorectal cancer screenings participation: A systematic review

Federica Vallone, Daniela Lemmo, Maria Luisa Martino, Anna Rosa Donizzetti, Maria Francesca Freda, Francesco Palumbo, Elvira Lorenzo, Angelo D'Argenzio, Daniela Caso

Summary: This study systematically reviewed factors promoting breast, cervical and colorectal cancer screening participation. By analyzing 102 studies, multiple individual-level, relational-level, and healthcare system-level factors were identified, which could potentially enhance the participation in screenings worldwide. The fragmentation of research highlights the need for better integration of study results.

PSYCHO-ONCOLOGY (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Clustering mixed-type data using a probabilistic distance algorithm

Cristina Tortora, Francesco Palumbo

Summary: This paper discusses a probabilistic distance clustering method adjusted for cluster size (PDQ) for handling mixed-type data, shows its advantages through a simulation design, and applies it to a real data set.

APPLIED SOFT COMPUTING (2022)

添加到收藏夹

Article Statistics & Probability

A hybrid approach for the analysis of complex categorical data structures: assessment of latent distance learning perception in higher education

Maria Iannario, Alfonso Iodice D'Enza, Rosaria Romano

Summary: This article introduces a method for dealing with ordinal response data, which tackles the problem by synthesizing multiple item responses into a meta-item and modeling the meta-item using regression approaches. Variable selection is performed automatically using a recursive partitioning method based on trees.

COMPUTATIONAL STATISTICS (2022)

添加到收藏夹

Article Clinical Neurology

Predicting the early visual outcomes in sellar-suprasellar lesions compressing the chiasm: the role of SD-OCT series of 20 patients operated via endoscopic endonasal approach

Domenico Solari, Gilda Cennamo, Francesca Amoroso, Federico Frio, Piero Donna, Alfonso Iodice D'Enza, Antonietta Melenzane, Teresa Somma, Fausto Tranfa, Luigi M. Cavallo

Summary: This study investigated the relationship between postoperative visual recovery and SD-OCT and best corrected visual acuity in patients with compression of the optic chiasm and nerve. The results showed that most patients experienced varying degrees of visual and visual field improvements after surgery. There was a direct correlation between preoperative retinal status and functional recovery, which can be used to predict postoperative visual recovery.

JOURNAL OF NEUROSURGICAL SCIENCES (2022)

添加到收藏夹

Review Psychology, Clinical

Clinical and psychosocial constructs for breast, cervical, and colorectal cancer screening participation: A systematic review

Daniela Lemmo, Maria Luisa Martino, Federica Vallone, Anna Rosa Donizzetti, Maria Francesca Freda, Francesco Palumbo, Elvira Lorenzo, Angelo D'Argenzio, Daniela Caso

Summary: Research has found various psychosocial factors associated with ongoing cancer screenings, but a systematic review on the theoretical frameworks and constructs used in studies on breast, cervical, and colorectal cancer screening participation has not been conducted. This study aimed to identify the main theoretical frameworks and constructs in the literature over the past five years to explain cancer screening participation. A search of databases was conducted according to PRISMA guidelines and 24 articles met the inclusion criteria. The findings highlight the diverse theoretical frameworks and constructs used to predict or promote cancer screening adherence and emphasize the need for further research to improve screening promotion interventions.

INTERNATIONAL JOURNAL OF CLINICAL AND HEALTH PSYCHOLOGY (2023)

添加到收藏夹

Correction Multidisciplinary Sciences

Principal component analysis (vol 2, 100, 2022)

Michael Greenacre, Patrick J. F. Groenen, Trevor Hastie, Alfonso Iodice D'Enza, Angelos Markos, Elena Tuzhilina

NATURE REVIEWS METHODS PRIMERS (2023)

添加到收藏夹

Article Multidisciplinary Sciences

Principal component analysis

Michael Greenacre, Patrick J. F. Groenen, Trevor Hastie, Alfonso Lodice D'Enza, Angelos Markos, Elena Tuzhilina

Summary: Principal component analysis is a versatile statistical method that reduces a large data table to its essential features. It explains the variance of the data by finding major components and supports graphical interpretation. Additionally, it can be used for handling incomplete data matrices and analyzing images, shapes, and functions.

NATURE REVIEWS METHODS PRIMERS (2022)

添加到收藏夹

Article Psychology, Applied

GAUSSIAN MIXTURE MODELS FOR THE ANALYSIS OF WISC-IV DIMENSIONS: A MULTIVARIATE APPROACH TO IMPROVE THE ASSESSMENT OF INTELLECTUAL FUNCTIONING

Rosa Fabbricatore, Davide Marocco, Francesco Palumbo, Serafino Buono, Santo Di Nuovo

TPM-TESTING PSYCHOMETRICS METHODOLOGY IN APPLIED PSYCHOLOGY (2020)

添加到收藏夹

Article Computer Science, Interdisciplinary Applications

Beyond Tandem Analysis: Joint Dimension Reduction and Clustering in R

Angelos Markos, Alfonso Iodice D'Enza, Michel van de Velden

JOURNAL OF STATISTICAL SOFTWARE (2019)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.