Article
Engineering, Multidisciplinary
Hongwei Du, Qiang Ye, Zhipeng Sun, Chuang Liu, Wen Xu
Summary: This study introduces two novel outlier detection algorithms for categorical data sets: Outlier Detection Tree (ODT) and FAST-ODT. ODT uses a classification tree and if-then rules to detect outliers in categorical data, while FAST-ODT achieves high detection accuracy with low time complexity.
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING
(2021)
Article
Computer Science, Artificial Intelligence
Junli Li, Zhanfeng Liu
Summary: Outlier detection plays a crucial role in data mining. However, most existing algorithms focus on either numerical or categorical attributes and neglect the mixture of attributes commonly found in real-world data. In this study, we propose a high-dimensional and massive mixed data outlier detection algorithm called PMIOD, which incorporates attribute weighting using mutual information. We also parallelize the mutual information computation on the Spark platform to improve efficiency. Experimental results on various datasets demonstrate the superior performance of the proposed algorithm.
EXPERT SYSTEMS WITH APPLICATIONS
(2024)
Article
Computer Science, Artificial Intelligence
Fuyuan Cao, Xiaolin Wu, Liqin Yu, Jiye Liang
Summary: This paper proposes an outlier detection algorithm for matrix-object data sets, which describes and calculates the outlier factor of matrix objects based on their coupling and cohesion. Experimental results have shown that the proposed algorithm effectively detects outliers compared to other algorithms on real and synthetic data sets.
APPLIED SOFT COMPUTING
(2021)
Article
Computer Science, Artificial Intelligence
Lianxi Wang, Yubing Ke
Summary: This paper proposes a feature selection method for outlier detection in categorical data, taking into account the feature relevance, interaction, redundancy, and complementarity. Experimental results demonstrate that the proposed method outperforms five other state-of-the-art feature selection methods on 14 real-world datasets.
KNOWLEDGE-BASED SYSTEMS
(2023)
Article
Physics, Multidisciplinary
Zihao Li, Liumei Zhang
Summary: This paper proposes a new outlier detection algorithm called EOEH, which improves the detection performance of high-dimensional data by utilizing random subsampling and information entropy-weighted subspaces. Through experiments, it is demonstrated that EOEH algorithm outperforms popular outlier detection algorithms in terms of detection performance and runtime efficiency.
Article
Computer Science, Information Systems
Qinli Zhang, Yiying Chen, Gangqiang Zhang, Zhaowen Li, Lijun Chen, Ching-Feng Wen
Summary: The paper discusses the handling of categorical data in machine learning, introducing fuzzy information structures and new uncertainty measurements for considering the equality of attribute values. Numerical experiments and statistical tests were conducted to evaluate the performance of the proposed measurements, showing that they outperform traditional measurements based on I-structures. Furthermore, attribute reduction algorithms based on the new measurements were presented and tested in clustering analysis, showing effective performance in reducing attributes.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Artificial Intelligence
Guansong Pang, Longbing Cao, Ling Chen
Summary: This study introduces a novel outlier detection framework to identify outliers in categorical data by capturing non-IID outlier factors. The graph representation and mining approach is employed to well capture the rich non-IID characteristics.
DATA MINING AND KNOWLEDGE DISCOVERY
(2021)
Article
Computer Science, Artificial Intelligence
Feng Wang, Jiye Liang, Peng Song
Summary: Feature selection is a widely used data preprocessing technique to improve model performance and efficiency. However, traditional approaches assume that data are independent and identically distributed (IID). This paper introduces new coupled similarity and relevance measures to capture coupling relationships between feature values and features. Based on coupling learning, an effective feature-selection algorithm for categorical data is developed and validated using common classifiers and UCI datasets.
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS
(2023)
Article
Computer Science, Information Systems
Ran Li, Hongchang Chen, Shuxin Liu, Xing Li, Yingle Li, Biao Wang
Summary: Outlier detection is a challenging task due to the nature of ubiquitous, incomplete, redundant, noisy, and mixed data. To address this challenge, this paper proposes an ILGNI network that considers both local and global information from incomplete mixed data. The network enhances connectivity between similar objects and weakens connectivity between heterogeneous objects, allowing for efficient graph-based outlier detection. Experiments on telecom fraud datasets demonstrate that the proposed algorithm achieves enhanced outlier detection performance with low time complexity and is applicable to various types of datasets.
INFORMATION SCIENCES
(2023)
Article
Computer Science, Artificial Intelligence
Fabrizio Angiulli, Fabio Fassetti, Luigi Palopoli, Cristina Serrao
Summary: This work focuses on the detection and explanation of anomalous values in categorical datasets. The authors propose the concept of frequency occurrence and an outlierness measure for identifying lower and upper outliers. They also provide interpretable explanations and a mechanism for selecting outstanding explanations.
APPLIED INTELLIGENCE
(2022)
Article
Multidisciplinary Sciences
Illia Horenko
Summary: Entropic outlier sparsification (EOS) is a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS solves the expected loss minimization problem with Shannon entropy regularization, providing a closed-form solution that incurs additional costs linearly dependent on statistics size and independent of data dimension. The results explain the optimality of using mixtures of spherically symmetric Gaussians for nonparametric probability distributions in algorithms working with squared Euclidean distances. Experimental results demonstrate that applying EOS to biomedical problems enables accurate prediction of patient mortality after heart failure, outperforming common learning tools.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2022)
Article
Computer Science, Artificial Intelligence
Zhong-Yang Xiong, Hua Long, Yu-Fang Zhang, Xiao-Xia Wang, Qin-Qin Gao, Lin-Tao Li, Min Zhang
Summary: Outlier detection is an important research direction in data mining, and most existing methods are not suitable for complex patterns. To address this, we propose a neighborhood weighted-based outlier detection algorithm that measures the local density of objects using a weighted nearest neighbor graph, and compares the differences in neighborhood weighted local density to determine the degree of being an outlier.
APPLIED INTELLIGENCE
(2023)
Article
Physics, Multidisciplinary
Ting-Li Chen, Elizabeth P. P. Chou, Hsieh Fushing
Summary: This research selects collections of major factors embedded within response-versus-covariate dynamics based on information theoretic measurements through Categorical Exploratory Data Analysis (CEDA) computing paradigm, exploring the relevance to Wiener-Granger causality. The selection task identifies a chief collection and several secondary collections, with reliability checks through algorithmic computations.
Article
Computer Science, Information Systems
Lina Wang, Qixiang Zhang, Xiling Niu, Yongjun Ren, Jinyue Xia
Summary: Outlier detection is a crucial area in data mining, aiming to identify inconsistencies in data sets. By reducing data dimensions to enhance performance and effectively applying in numerical and mixed multidimensional data, the proposed method has the potential to improve outlier detection accuracy.
CMC-COMPUTERS MATERIALS & CONTINUA
(2021)
Article
Computer Science, Artificial Intelligence
Qiang Gao, Qin-Qin Gao, Zhong-Yang Xiong, Yu-Fang Zhang, Yu-Qin Wang, Min Zhang
Summary: This paper conducts in-depth research on the problems of low-density pattern and local outliers detection in outlier detection algorithms and proposes a double-weighted algorithm considering the dense direction. The algorithm explores the relationship between data points and their neighbor distribution by considering distance and orientation, designs new point weighting and edge weighting strategies, and achieves better representation of the potential structural information inside the data.
APPLIED INTELLIGENCE
(2023)