4.2 Article

Exploring incomplete data using visualization techniques

Journal

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Volume 6, Issue 1, Pages 29-47

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s11634-011-0102-y

Keywords

Visualization; Missing values; Exploring incomplete data; R software

Funding

  1. European Union [217322]

Ask authors/readers for more resources

Visualization of incomplete data allows to simultaneously explore the data and the structure of missing values. This is helpful for learning about the distribution of the incomplete information in the data, and to identify possible structures of the missing values and their relation to the available information. The main goal of this contribution is to stress the importance of exploring missing values using visualization methods and to present a collection of such visualization techniques for incomplete data, all of which are implemented in the the R package VIM. Providing such functionality for this widely used statistical environment, visualization of missing values, imputation and data analysis can all be done from within R without the need of additional software.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Statistics & Probability

A comparison of generalised linear models and compositional models for ordered categorical data

Ondrej Vencalek, Karel Hron, Peter Filzmoser

STATISTICAL MODELLING (2020)

Article Computer Science, Interdisciplinary Applications

Cellwise robust M regression

P. Filzmoser, S. Hoppner, I Ortner, S. Serneels, T. Verdonck

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2020)

Article Computer Science, Artificial Intelligence

Robust and sparse multigroup classification by the optimal scoring approach

Irene Ortner, Peter Filzmoser, Christophe Croux

DATA MINING AND KNOWLEDGE DISCOVERY (2020)

Article Statistics & Probability

Robust principal component analysis for compositional tables

J. de Sousa, K. Hron, K. Facevicova, P. Filzmoser

Summary: Compositional tables are arranged according to two factors and analyzed by ratios between cells. A special choice of coordinates related to centered logratio coefficients is proposed for interpretation and use in robust principal component analysis. This method enables exploration of relationships between factors while addressing the singularity issue of clr coefficients.

JOURNAL OF APPLIED STATISTICS (2021)

Article Geosciences, Multidisciplinary

Weighted Symmetric Pivot Coordinates for Compositional Data with Geochemical Applications

Karel Hron, Mark Engle, Peter Filzmoser, Eva Fiserova

Summary: Negative correlations between elements, molecules, or minerals can indicate various geochemical processes. Symmetric pivot coordinates are developed to identify positive and negative correlations between different parts in compositional data.

MATHEMATICAL GEOSCIENCES (2021)

Article Geosciences, Multidisciplinary

Classical and Robust Regression Analysis with Compositional Data

K. G. van den Boogaart, P. Filzmoser, K. Hron, M. Templ, R. Tolosana-Delgado

Summary: Compositional data contain valuable information within the relationships between the compositional parts, which can be utilized for regression modeling. Balance coordinates are constructed to interpret regression coefficients and test hypotheses of subcompositional independence. Both classical least-squares regression and robust MM regression were compared within different regression models using a real data set from a geochemical mapping project.

MATHEMATICAL GEOSCIENCES (2021)

Article Geochemistry & Geophysics

pXRF Measurements on Soil Samples for the Exploration of an Antimony Deposit: Example from the Vendean Antimony District (France)

Bruno Lemiere, Jeremie Melleton, Pascal Auger, Virginie Derycke, Eric Gloaguen, Loic Bouat, Dominika Miksova, Peter Filzmoser, Maarit Middleton

MINERALS (2020)

Article Statistics & Probability

Robust regression with compositional covariates including cellwise outliers

Nikola Stefelova, Andreas Alfons, Javier Palarea-Albaladejo, Peter Filzmoser, Karel Hron

Summary: The study presents a robust procedure for estimating a linear regression model with compositional and real-valued explanatory variables, designed to handle outliers and produce results aligned with established scientific knowledge. By filtering and imputing cellwise outliers before performing rowwise robust compositional regression, the proposed procedure outperforms traditional and other robust regression methods.

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2021)

Article Biochemistry & Molecular Biology

Statistical Analysis of Chemical Element Compositions in Food Science: Problems and Possibilities

Matthias Templ, Barbara Templ

Summary: Our study compares compositional data analysis (CoDa) with classical statistical analysis to demonstrate how results vary depending on the approach, with importance shown for methods like principle component analysis (PCA) and log-ratio analysis. It emphasizes the need to apply CoDa methods for better separation, interpretability, and classification accuracy in analyzing food chemical elements and characterizing food products.

MOLECULES (2021)

Article Computer Science, Information Systems

A systematic overview on methods to protect sensitive data provided for various analyses

Matthias Templ, Murat Sariyar

Summary: Considering the advancements in protecting sensitive data, especially in privacy-preserving computation and federated learning, there is a need to categorize and compare various methods from different fields. Providing guidance for practice is important, as it helps practitioners have an overview of suitable approaches for specific scenarios. This categorization also contributes to the development of a comprehensive ontology for anonymization.

INTERNATIONAL JOURNAL OF INFORMATION SECURITY (2022)

Article Geochemistry & Geophysics

A new version of the Langelier-Ludwig square diagram under a compositional perspective

Matthias Templ, Caterina Gozzi, Antonella Buccianti

Summary: The Langelier-Ludwig square diagram is a commonly used diagnostic tool in groundwater chemistry, but the classic version may lead to incorrect conclusions. A new version of the diagram is proposed, which provides a better and unbiased understanding of water-environment interactions by describing the intricate relationship between chemical species in aqueous solutions.

JOURNAL OF GEOCHEMICAL EXPLORATION (2022)

Article Public, Environmental & Occupational Health

Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization

Matthias Templ, Chifundo Kanjala, Inken Siems

Summary: This study aims to highlight the requirements and solutions for sharing health surveillance event history data. The proposed approaches enable the anonymization of data while preserving utility and reducing the risk of disclosure, making the data shareable as public use data. This is particularly significant for HDSS and medical science research communities in low- and middle-income countries.

JMIR PUBLIC HEALTH AND SURVEILLANCE (2022)

Article Mathematics

Enhancing Precision in Large-Scale Data Analysis: An Innovative Robust Imputation Algorithm for Managing Outliers and Missing Values

Matthias Templ

Summary: In the complex world of data analytics, multiple imputation has emerged as a key tool for addressing missing data, and its powerful variant, robust imputation, further enhances the precision and reliability of its results. Non-robust methods can be influenced by extreme outliers, leading to skewed imputations and biased estimates. Robust imputation methods effectively manage outliers and provide a more reliable approach to dealing with missing data.

MATHEMATICS (2023)

Article Computer Science, Interdisciplinary Applications

Robust Mediation Analysis: The R Package robmed

Andreas Alfons, Nufer Y. Ates, Patrick J. F. Groenen

Summary: Mediation analysis is a widely used statistical technique in social, behavioral, and medical sciences for studying the indirect effects of independent variables on dependent variables through intervening variables. However, existing methods are sensitive to outliers and deviations from normality assumptions, which can threaten the empirical testing of mediation mechanisms. The robmed package in R implements a robust procedure for mediation analysis that addresses these issues and provides various analysis methods and result visualization.

JOURNAL OF STATISTICAL SOFTWARE (2022)

Article Psychology, Applied

A Robust Bootstrap Test for Mediation Analysis

Andreas Alfons, Nufer Yasin Ates, Patrick J. F. Groenen

Summary: Mediation analysis is crucial in organizational sciences, but traditional linear regression analysis based on normal-theory maximum likelihood estimators is sensitive to deviations from normality assumptions. To address this issue, a robust mediation method has been developed, which demonstrates superior estimation of effect size and reliability in assessing significance, along with freely available software for empirical researchers.

ORGANIZATIONAL RESEARCH METHODS (2022)

No Data Available