Article
Mathematics
Cristian Preda, Quentin Grimonprez, Vincent Vandewalle
Summary: This paper discusses categorical functional data represented by paths of a stochastic jump process and extends the concept of multiple correspondence analysis. By approximating the optimal encoding of states over time, it achieves dimension reduction, optimal representation, and visualization of data. The methodology is implemented in the cfda R package and demonstrated using a real data set in the clustering framework.
Article
Environmental Sciences
Wei Liu, Sheng Chen, Fuchang Tian
Summary: This study presents an approach to define typical modes of rainfall in urban areas from the temporal and spatial dimensions, and the analysis of monitoring data from multiple rainfall stations in Beijing reveals three modes of rainstorms.
Article
Computer Science, Artificial Intelligence
Yuehan Yang
Summary: This paper focuses on modeling categorical data with two or multiple responses and proposes an efficient iterative procedure based on sufficient dimension reduction to consider the interactions between the responses. Theoretical guarantees are provided under the two-and multiple-response models, and the uniqueness of the proposed estimator is demonstrated. The proposed method is efficient in the multiple-response model and outperforms existing methods in the same models, as demonstrated through application to adult and right heart catheterization datasets.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Engineering, Civil
Honghua Liu, Jing Yang, Ming Ye, Scott C. James, Zhonghua Tang, Jie Dong, Tongju Xing
Summary: This study introduced t-SNE as a graphic approach to assist cluster analysis for groundwater geochemistry data. Compared to PCA, t-SNE performed better in assisting cluster analysis, showing promise as a tool for determining cluster numbers and delineating spatial zones.
JOURNAL OF HYDROLOGY
(2021)
Article
Computer Science, Artificial Intelligence
Liang Bai, Jiye Liang
Summary: This paper introduces a graph-based framework for clustering categorical data. The proposed method learns the representation of categorical values from their similar graph to provide similar representations for similar categorical values. Experimental results demonstrate the effectiveness of the framework compared to other methods.
PATTERN RECOGNITION
(2022)
Article
Computer Science, Artificial Intelligence
Debajyoti Bera, Rameshwar Pratap, Bhisham Dev Verma
Summary: This work focuses on compressing vectors over categorical attributes to low-dimension discrete vectors. Existing hash-based methods lack guarantees on the Hamming distances between compressed representations. FSketch is introduced to create sketches for sparse categorical data and estimate pairwise Hamming distances from the sketches. These sketches can be used instead of original data in data mining tasks without compromising the quality, thanks to their categorical, sparse nature and reasonably precise Hamming distance estimates. The single-pass algorithm is efficient and applicable to various real-life scenarios, as demonstrated by theoretical analysis and comparative evaluations on real-world datasets.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)
Article
Statistics & Probability
Mariko Takagishi, Michel van de Velden
Summary: In multiple correspondence analysis, a biplot can be used to depict the relationships between categories and individuals. Additional information about individuals can enhance interpretation capacities, such as including class information to facilitate the interpretation of relationships between individuals and categories.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2022)
Article
Genetics & Heredity
Yan Li, Gen Li, Kun Chen
Summary: Microbiome studies have recently gained popularity, but due to sparse and high dimensional data, dimension reduction is often necessary. We propose a novel dimension reduction method for microbiome data called Principal Amalgamation Analysis (PAA), which aggregates compositions into principal compositions guided by the available taxonomic structure. We develop visualization tools for data visualization and demonstrate the effectiveness of PAA using real microbiome data.
Article
Biochemistry & Molecular Biology
Ulrich Lautenschlager
Summary: This article introduces a lightweight command-line tool called Crimp, which solves the label-switching problem. It aligns clusterings with the same number of clusters, making it easier to compare and generate an averaged clustering. Benchmark analyses show that Crimp performs well in terms of runtime requirements and solution quality, especially for larger data sets.
MOLECULAR ECOLOGY RESOURCES
(2023)
Article
Engineering, Industrial
Kai Wang, Jian Li, Fugee Tsung
Summary: This study proposes an efficient and interpretable probabilistic tensor decomposition model for monitoring high-dimensional categorical data. By decomposing a huge tensor into latent classes, the number of model parameters is dramatically reduced. To improve interpretability, a novel polarization regularization method is used. Extensive simulations and a real case study validate the superior inference and monitoring performance of the proposed method.
Article
Business, Finance
Juan Chen, Feng Ma, Xuemei Qiu, Tao Li
Summary: This study examines the predictive ability of categorical economic-policy uncertainty (EPU) indices for stock-market returns. The findings suggest that certain categorical EPU indices outperform the original EPU index and popular predictors in predicting stock returns, achieving higher realized utility. Moreover, diffusion indices based on EPU categories, particularly those utilizing partial least squares (PLS) to extract principal components, effectively utilize forecast information from categorical EPU indices, resulting in improved forecast performance, reduced errors, and increased economic value for investors. Additionally, categorical EPU indices demonstrate superior forecasting performance during economic expansions, the China-US trade war, and the COVID-19 pandemic.
INTERNATIONAL REVIEW OF ECONOMICS & FINANCE
(2023)
Article
Geography
Paul H. Jung, Jun Song
Summary: Recent neighborhood studies have focused on longitudinal aspects of neighborhood change and data-mining methodologies. Using functional data analysis method, neighborhood trajectory clusters are identified through mathematically represented multivariate time-dependent curves. The model has been applied to Charlotte and Detroit areas to analyze ongoing racial and socioeconomic segregation patterns and time dynamics of neighborhood change.
GEOGRAPHICAL ANALYSIS
(2022)
Article
Transportation
Subasish Das, Md Mahmud Hossain, M. Ashifur Rahman, Xiaoqiang Kong, Xiaoduan Sun, G. M. Al Mamun
Summary: Moped and seated motor scooter riders have a high risk of being involved in crash accidents. A study analyzed fatal crash data and identified critical clusters in order to develop interventions to minimize collisions and fatalities.
TRANSPORTMETRICA A-TRANSPORT SCIENCE
(2023)
Article
Computer Science, Artificial Intelligence
Bhisham Dev Verma, Rameshwar Pratap, Debajyoti Bera
Summary: This paper presents a dimensionality reduction algorithm for categorical datasets, which constructs low-dimensional binary sketches from high-dimensional categorical vectors and approximates the Hamming distance between any two original vectors. The approach is particularly useful for sparse datasets and has been rigorously analyzed and experimentally validated.
DATA MINING AND KNOWLEDGE DISCOVERY
(2022)
Article
Computer Science, Interdisciplinary Applications
Chenlu Ke, Wei Yang, Qingcong Yuan, Lu Li
Summary: Variable screening is an important tool for dimension reduction in ultrahigh dimensional data analysis. This study proposes a partial sufficient variable screening method for the presence of control variables, which aims to reduce the predictive set without losing regression information. The method achieves variable screening by constraining the reduction of continuous variables using the subpopulations identified by categorical variables. The effectiveness of the method is demonstrated through simulation studies and an application in gene screening for diffuse large-B-cell lymphoma prognosis.
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2023)
Article
Neurosciences
Antonio Cerrato, Daniela Pacella, Francesco Palumbo, Diane Beauvais, Michela Ponticorvo, Orazio Miglino, Paolo Bartolomeo
Summary: E-TAN is a technology-enhanced platform for assessing visual neglect patients, which records the location, sequence, and timing of objects through an automatized process, effectively discriminating patients with visual neglect from those without. Patients can use this platform at home to track the effects of rehabilitation.
NEUROPSYCHOLOGICAL REHABILITATION
(2021)
Article
Statistics & Probability
Rosaria Romano, Francesco Palumbo
Summary: The paper introduces a new method called partial possibilistic regression path modeling, which combines principles of path modeling and possibilistic regression to model relationships among blocks of variables. The comparison with a classical composite-based path model is based on a simulation study, while a case study on the use of Wikipedia in higher education illustrates the usability context of the proposed method.
COMPUTATIONAL STATISTICS
(2021)
Article
Statistics & Probability
A. Iodice D'Enza, A. Markos, F. Palumbo
Summary: Two chunk-wise implementations of RPCA suitable for tall data sets are proposed in this paper, with one for distributed computation and the other for incremental computation. Experimental results show that the distributed approach performs similarly to batch RPCA for data with completely random missing entries, while the incremental approach shows good performance for data with non-completely random missing entries if the first analyzed chunks contain sufficient information on the data structure.
STATISTICAL METHODS AND APPLICATIONS
(2022)
Article
Statistics & Probability
Mariko Takagishi, Michel van de Velden
Summary: In multiple correspondence analysis, a biplot can be used to depict the relationships between categories and individuals. Additional information about individuals can enhance interpretation capacities, such as including class information to facilitate the interpretation of relationships between individuals and categories.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2022)
Article
Statistics & Probability
Pieter C. Schoonees, Patrick J. F. Groenen, Michel van de Velden
Summary: A least-squares bilinear clustering framework is introduced to model three-way data, combining bilinear decompositions of two-way matrices with clustering over observations. Different clusterings are defined for each part of the bilinear decomposition, reducing computational burden. The method simplifies the joint clustering problem into separate problems that can be handled independently.
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
(2022)
Review
Oncology
Federica Vallone, Daniela Lemmo, Maria Luisa Martino, Anna Rosa Donizzetti, Maria Francesca Freda, Francesco Palumbo, Elvira Lorenzo, Angelo D'Argenzio, Daniela Caso
Summary: This study systematically reviewed factors promoting breast, cervical and colorectal cancer screening participation. By analyzing 102 studies, multiple individual-level, relational-level, and healthcare system-level factors were identified, which could potentially enhance the participation in screenings worldwide. The fragmentation of research highlights the need for better integration of study results.
Article
Computer Science, Artificial Intelligence
Cristina Tortora, Francesco Palumbo
Summary: This paper discusses a probabilistic distance clustering method adjusted for cluster size (PDQ) for handling mixed-type data, shows its advantages through a simulation design, and applies it to a real data set.
APPLIED SOFT COMPUTING
(2022)
Article
Statistics & Probability
Maria Iannario, Alfonso Iodice D'Enza, Rosaria Romano
Summary: This article introduces a method for dealing with ordinal response data, which tackles the problem by synthesizing multiple item responses into a meta-item and modeling the meta-item using regression approaches. Variable selection is performed automatically using a recursive partitioning method based on trees.
COMPUTATIONAL STATISTICS
(2022)
Article
Clinical Neurology
Domenico Solari, Gilda Cennamo, Francesca Amoroso, Federico Frio, Piero Donna, Alfonso Iodice D'Enza, Antonietta Melenzane, Teresa Somma, Fausto Tranfa, Luigi M. Cavallo
Summary: This study investigated the relationship between postoperative visual recovery and SD-OCT and best corrected visual acuity in patients with compression of the optic chiasm and nerve. The results showed that most patients experienced varying degrees of visual and visual field improvements after surgery. There was a direct correlation between preoperative retinal status and functional recovery, which can be used to predict postoperative visual recovery.
JOURNAL OF NEUROSURGICAL SCIENCES
(2022)
Review
Psychology, Clinical
Daniela Lemmo, Maria Luisa Martino, Federica Vallone, Anna Rosa Donizzetti, Maria Francesca Freda, Francesco Palumbo, Elvira Lorenzo, Angelo D'Argenzio, Daniela Caso
Summary: Research has found various psychosocial factors associated with ongoing cancer screenings, but a systematic review on the theoretical frameworks and constructs used in studies on breast, cervical, and colorectal cancer screening participation has not been conducted. This study aimed to identify the main theoretical frameworks and constructs in the literature over the past five years to explain cancer screening participation. A search of databases was conducted according to PRISMA guidelines and 24 articles met the inclusion criteria. The findings highlight the diverse theoretical frameworks and constructs used to predict or promote cancer screening adherence and emphasize the need for further research to improve screening promotion interventions.
INTERNATIONAL JOURNAL OF CLINICAL AND HEALTH PSYCHOLOGY
(2023)
Correction
Multidisciplinary Sciences
Michael Greenacre, Patrick J. F. Groenen, Trevor Hastie, Alfonso Iodice D'Enza, Angelos Markos, Elena Tuzhilina
NATURE REVIEWS METHODS PRIMERS
(2023)
Article
Multidisciplinary Sciences
Michael Greenacre, Patrick J. F. Groenen, Trevor Hastie, Alfonso Lodice D'Enza, Angelos Markos, Elena Tuzhilina
Summary: Principal component analysis is a versatile statistical method that reduces a large data table to its essential features. It explains the variance of the data by finding major components and supports graphical interpretation. Additionally, it can be used for handling incomplete data matrices and analyzing images, shapes, and functions.
NATURE REVIEWS METHODS PRIMERS
(2022)
Article
Psychology, Applied
Rosa Fabbricatore, Davide Marocco, Francesco Palumbo, Serafino Buono, Santo Di Nuovo
TPM-TESTING PSYCHOMETRICS METHODOLOGY IN APPLIED PSYCHOLOGY
(2020)
Article
Computer Science, Interdisciplinary Applications
Angelos Markos, Alfonso Iodice D'Enza, Michel van de Velden
JOURNAL OF STATISTICAL SOFTWARE
(2019)