Article
Ecology
Thomas F. Johnson, Nick J. B. Isaac, Agustin Paviolo, Manuela Gonzalez-Suarez
Summary: The study evaluated the performance of approaches for handling missing values in biased datasets and found that imputation can effectively handle missing data in some conditions but is not always the best solution. None of the tested methods could effectively deal with severe biases, highlighting the importance of rigorous data checking and proposing variables to assist researchers in detecting and minimizing errors in incomplete datasets.
GLOBAL ECOLOGY AND BIOGEOGRAPHY
(2021)
Article
Biochemical Research Methods
Mengbo Li, Gordon K. Smyth
Summary: Mass spectrometry proteomics in biomedical research suffers from the problem of missing values in peptides. Many analysis strategies have been proposed to distinguish different types of missing values and estimate detection probabilities. A logit-linear function is used to accurately model the detection probability, showing that missing values are related to peptide intensity. A probability model is developed to infer the distribution of unobserved intensities from observed values.
Article
Computer Science, Software Engineering
Sara Johansson Fernstad, Jimmy Johansson Westberg
Summary: This article introduces a novel visualization method called Missingness Glyph for analyzing and exploring missing values in data. The Missingness Glyph helps to identify relevant missingness patterns and performs better than alternative visualization methods in certain cases.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
(2022)
Article
Biochemical Research Methods
Laurent Gatto, Sebastian Gibb, Johannes Rainer
Summary: Version 2 of the MSnbase R/Bioconductor package is focused on new on-disk infrastructure for manipulating, processing, and visualizing mass spectrometry data. This update allows handling of large raw mass spectrometry experiments on commodity hardware, showcasing elegant data processing, method development, and visualization capabilities.
JOURNAL OF PROTEOME RESEARCH
(2021)
Article
Multidisciplinary Sciences
Wanyanhan Jiang, Han Chen, Lian Yang, Xiaoqi Pan
Summary: When comparing means of different groups, it is necessary to explore and compare data for influencing factors or relative indices. This can be a complex and challenging process, especially for users who lack statistical knowledge and coding experience. To address this issue, we developed moreThanANOVA, an interactive, user-friendly, open-source, and cloud-based application that automates distribution tests and correlative significance tests, allowing users to customize post-hoc analysis based on their considerations.
Article
Computer Science, Information Systems
Duy-Tai Dinh, Van-Nam Huynh, Songsak Sriboonchitta
Summary: This paper introduces a novel clustering algorithm k-CMM for handling missing values in mixed numerical and categorical data, integrating imputation and clustering steps. The algorithm utilizes decision tree and mean and kernel methods for cluster center formation, outperforming other algorithms when the dataset has increasing missing values.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Information Systems
Kaj Dreef, Vijay Krishna Palepu, James A. Jones
Summary: Current software-development tools make it difficult to understand the test execution of software, both for granular tasks (e.g., identifying test cases for a specific method) and global tasks (e.g., determining the proportion of unit tests to system tests). Existing tools lack global overview and historical information. This paper proposes a novel, interactive, matrix-based visual interface to address these challenges and provides a user study and case studies to demonstrate its effectiveness.
INFORMATION AND SOFTWARE TECHNOLOGY
(2023)
Article
Biochemistry & Molecular Biology
Shuangbin Xu, Zehan Dai, Pingfan Guo, Xiaocong Fu, Shanshan Liu, Lang Zhou, Wenli Tang, Tingze Feng, Meijun Chen, Li Zhan, Tianzhi Wu, Erqiang Hu, Yong Jiang, Xiaochen Bo, Guangchuang Yu
Summary: ggtreeExtra is a universal tool for visualizing tree data, supporting various data types and visualization methods. By integrating evolutionary statistics and external data, it extends the applications of phylogenetic trees in different disciplines.
MOLECULAR BIOLOGY AND EVOLUTION
(2021)
Article
Health Care Sciences & Services
Jiang Li, Xiaowei S. Yan, Durgesh Chaudhary, Venkatesh Avula, Satish Mudiganti, Hannah Husby, Shima Shahjouei, Ardavan Afshar, Walter F. Stewart, Mohammed Yeasin, Ramin Zand, Vida Abedi
Summary: Laboratory data from EHR can be used in prediction models to mitigate estimation bias and improve model performance with missingness using imputation methods. The study found that missingness in EHR laboratory variables was associated with patients' comorbidity data, and the multi-level imputation algorithm showed smaller imputation error compared to the cross-sectional method.
NPJ DIGITAL MEDICINE
(2021)
Article
Computer Science, Software Engineering
Maoyuan Sun, Abdul Rahman Shaikh, Hamed Alhoori, Jian Zhao
Summary: This paper presents SightBi, a visual analytics approach for exploring cross-view data relationships. SightBi formalizes cross-view data relationships, computes them, and utilizes a bi-context design to provide stand-alone relationship views for guiding user exploration. A usage scenario demonstrates the usefulness of SightBi for sensemaking of cross-view data relationships.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
(2022)
Article
Computer Science, Information Systems
Roozbeh Razavi-Far, Daoming Wan, Mehrdad Saif, Niloofar Mozafari
Summary: This article evaluates main strategies for the treatment of missing values in misbehavior detection using incomplete V2X communications data. It proposes two novel methods for imputing and tolerating missing data and compares them with existing methods. The results show that the proposed missing-tolerant method outperforms others in terms of accuracy and F-measure.
IEEE INTERNET OF THINGS JOURNAL
(2022)
Review
Biochemical Research Methods
Weijia Kong, Harvard Wai Hann Hui, Hui Peng, Wilson Wen Bin Goh
Summary: Proteomics data often have missing values, which can affect subsequent statistical analyses. Different missing value imputation methods have been developed, and their performance varies when dealing with the same dataset. Choosing the right method is important for satisfactory results, and other factors such as confounders should also be considered.
Article
Computer Science, Interdisciplinary Applications
Konstantinos Kagkelidis, Ilias Dimitriadis, Athena Vakali
Summary: This paper discusses the improvement of complex visualization pipelines and introduces Lumina, a visualization framework that aims to simplify user experience and interaction, while enhancing the final visualization results based on semantic analysis of linked data.
JOURNAL OF VISUALIZATION
(2021)
Article
Computer Science, Artificial Intelligence
Lin Sun, Tianxiang Wang, Weiping Ding, Jiucheng Xu, Anhui Tan
Summary: This paper presents a neighborhood-based multilabel classification method for dealing with missing labels in real-world multilabel data. By defining the neighborhood radius, restoring missing feature values, and investigating the fuzzy similarity relationship among samples, the classification performance of data with missing labels is improved.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
(2022)
Article
Energy & Fuels
Antonio Liguori, Romana Markovic, Martina Ferrando, Jerome Frisch, Francesco Causone, Christoph van Treeck
Summary: This study investigates the use of data augmentation techniques for reconstructing missing energy time-series in limited data scenarios. A convolutional denoising autoencoder is chosen as the base imputation model, and an optimal augmentation rate is determined based on preliminary results. The results show that augmenting a nine days-long training set 80 times can significantly reduce the initial average RMSE and outperform benchmark methods.
Article
Statistics & Probability
Ondrej Vencalek, Karel Hron, Peter Filzmoser
STATISTICAL MODELLING
(2020)
Article
Computer Science, Interdisciplinary Applications
P. Filzmoser, S. Hoppner, I Ortner, S. Serneels, T. Verdonck
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2020)
Article
Computer Science, Artificial Intelligence
Irene Ortner, Peter Filzmoser, Christophe Croux
DATA MINING AND KNOWLEDGE DISCOVERY
(2020)
Article
Statistics & Probability
J. de Sousa, K. Hron, K. Facevicova, P. Filzmoser
Summary: Compositional tables are arranged according to two factors and analyzed by ratios between cells. A special choice of coordinates related to centered logratio coefficients is proposed for interpretation and use in robust principal component analysis. This method enables exploration of relationships between factors while addressing the singularity issue of clr coefficients.
JOURNAL OF APPLIED STATISTICS
(2021)
Article
Geosciences, Multidisciplinary
Karel Hron, Mark Engle, Peter Filzmoser, Eva Fiserova
Summary: Negative correlations between elements, molecules, or minerals can indicate various geochemical processes. Symmetric pivot coordinates are developed to identify positive and negative correlations between different parts in compositional data.
MATHEMATICAL GEOSCIENCES
(2021)
Article
Geosciences, Multidisciplinary
K. G. van den Boogaart, P. Filzmoser, K. Hron, M. Templ, R. Tolosana-Delgado
Summary: Compositional data contain valuable information within the relationships between the compositional parts, which can be utilized for regression modeling. Balance coordinates are constructed to interpret regression coefficients and test hypotheses of subcompositional independence. Both classical least-squares regression and robust MM regression were compared within different regression models using a real data set from a geochemical mapping project.
MATHEMATICAL GEOSCIENCES
(2021)
Article
Geochemistry & Geophysics
Bruno Lemiere, Jeremie Melleton, Pascal Auger, Virginie Derycke, Eric Gloaguen, Loic Bouat, Dominika Miksova, Peter Filzmoser, Maarit Middleton
Article
Statistics & Probability
Nikola Stefelova, Andreas Alfons, Javier Palarea-Albaladejo, Peter Filzmoser, Karel Hron
Summary: The study presents a robust procedure for estimating a linear regression model with compositional and real-valued explanatory variables, designed to handle outliers and produce results aligned with established scientific knowledge. By filtering and imputing cellwise outliers before performing rowwise robust compositional regression, the proposed procedure outperforms traditional and other robust regression methods.
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
(2021)
Article
Biochemistry & Molecular Biology
Matthias Templ, Barbara Templ
Summary: Our study compares compositional data analysis (CoDa) with classical statistical analysis to demonstrate how results vary depending on the approach, with importance shown for methods like principle component analysis (PCA) and log-ratio analysis. It emphasizes the need to apply CoDa methods for better separation, interpretability, and classification accuracy in analyzing food chemical elements and characterizing food products.
Article
Computer Science, Information Systems
Matthias Templ, Murat Sariyar
Summary: Considering the advancements in protecting sensitive data, especially in privacy-preserving computation and federated learning, there is a need to categorize and compare various methods from different fields. Providing guidance for practice is important, as it helps practitioners have an overview of suitable approaches for specific scenarios. This categorization also contributes to the development of a comprehensive ontology for anonymization.
INTERNATIONAL JOURNAL OF INFORMATION SECURITY
(2022)
Article
Geochemistry & Geophysics
Matthias Templ, Caterina Gozzi, Antonella Buccianti
Summary: The Langelier-Ludwig square diagram is a commonly used diagnostic tool in groundwater chemistry, but the classic version may lead to incorrect conclusions. A new version of the diagram is proposed, which provides a better and unbiased understanding of water-environment interactions by describing the intricate relationship between chemical species in aqueous solutions.
JOURNAL OF GEOCHEMICAL EXPLORATION
(2022)
Article
Public, Environmental & Occupational Health
Matthias Templ, Chifundo Kanjala, Inken Siems
Summary: This study aims to highlight the requirements and solutions for sharing health surveillance event history data. The proposed approaches enable the anonymization of data while preserving utility and reducing the risk of disclosure, making the data shareable as public use data. This is particularly significant for HDSS and medical science research communities in low- and middle-income countries.
JMIR PUBLIC HEALTH AND SURVEILLANCE
(2022)
Article
Mathematics
Matthias Templ
Summary: In the complex world of data analytics, multiple imputation has emerged as a key tool for addressing missing data, and its powerful variant, robust imputation, further enhances the precision and reliability of its results. Non-robust methods can be influenced by extreme outliers, leading to skewed imputations and biased estimates. Robust imputation methods effectively manage outliers and provide a more reliable approach to dealing with missing data.
Article
Computer Science, Interdisciplinary Applications
Andreas Alfons, Nufer Y. Ates, Patrick J. F. Groenen
Summary: Mediation analysis is a widely used statistical technique in social, behavioral, and medical sciences for studying the indirect effects of independent variables on dependent variables through intervening variables. However, existing methods are sensitive to outliers and deviations from normality assumptions, which can threaten the empirical testing of mediation mechanisms. The robmed package in R implements a robust procedure for mediation analysis that addresses these issues and provides various analysis methods and result visualization.
JOURNAL OF STATISTICAL SOFTWARE
(2022)
Article
Psychology, Applied
Andreas Alfons, Nufer Yasin Ates, Patrick J. F. Groenen
Summary: Mediation analysis is crucial in organizational sciences, but traditional linear regression analysis based on normal-theory maximum likelihood estimators is sensitive to deviations from normality assumptions. To address this issue, a robust mediation method has been developed, which demonstrates superior estimation of effect size and reliability in assessing significance, along with freely available software for empirical researchers.
ORGANIZATIONAL RESEARCH METHODS
(2022)