4.7 Article

A novel outlier cluster detection algorithm without top-n parameter

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 121, Issue -, Pages 32-40

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.knosys.2017.01.013

Keywords

Outlier detection; Outlier clusters; Top-n problem; Mutual neighbor

Funding

  1. National Natural Science Foundation of China [61272194]
  2. [KJZH17104]

Ask authors/readers for more resources

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. Outlier detection has been widely focused and studied in recent years. The concept about outlier factor of object is extended to the case of cluster. Although many outlier detection algorithms have been proposed, most of them face the top-n problem, i.e., it is difficult to know how many points in a database are outliers. In this paper we propose a novel outlier-cluster detection algorithm called ROCF based on the concept of mutual neighbor graph and on the idea that the size of outlier clusters is usually much smaller than the normal clusters. ROCF can automatically figure out the outlier rate of a database and effectively detect the outliers and outlier clusters without top-n parameter. The formal analysis and experiments show that this method can achieve good performance in outlier detection. (C) 2017 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Information Systems

Energy and Migration Cost-Aware Dynamic Virtual Machine Consolidation in Heterogeneous Cloud Datacenters

Quanwang Wu, Fuyuki Ishikawa, Qingsheng Zhu, Yunni Xia

IEEE TRANSACTIONS ON SERVICES COMPUTING (2019)

Article Computer Science, Artificial Intelligence

A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors

Junnan Li, Qingsheng Zhu, Quanwang Wu

APPLIED INTELLIGENCE (2020)

Article Automation & Control Systems

MOELS: Multiobjective Evolutionary List Scheduling for Cloud Workflows

Quanwang Wu, MengChu Zhou, Qingsheng Zhu, Yunni Xia, Junhao Wen

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2020)

Article Computer Science, Artificial Intelligence

A boosting Self-Training Framework based on Instance Generation with Natural Neighbors forKNearest Neighbor

Junnan Li, Qingsheng Zhu

APPLIED INTELLIGENCE (2020)

Article Computer Science, Artificial Intelligence

An effective framework based on local cores for self-labeled semi-supervised classification

Junnan Li, Qingsheng Zhu, Quanwang Wu, Dongdong Cheng

KNOWLEDGE-BASED SYSTEMS (2020)

Article Computer Science, Artificial Intelligence

Clustering with Local Density Peaks-Based Minimum Spanning Tree

Dongdong Cheng, Qingsheng Zhu, Jinlong Huang, Quanwang Wu, Lijun Yang

Summary: The paper introduces a novel MST-based clustering algorithm LDP-MST, which utilizes local density peaks and a new distance measurement method to effectively discover clusters with complex structures. The experimental results demonstrate that the proposed algorithm is competitive with state-of-the-art methods in cluster discovery.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

Article Computer Science, Information Systems

A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

Junnan Li, Qingsheng Zhu, Quanwang Wu, Zhu Fan

Summary: Class imbalance is a significant factor leading to performance deterioration in classifiers. Techniques such as SMOTE and its extension, NaNSMOTE, have been successful in addressing this issue and have been proven effective on real data sets.

INFORMATION SCIENCES (2021)

Article Computer Science, Artificial Intelligence

A novel hierarchical clustering algorithm with merging strategy based on shared subordinates

Jinxin Shi, Qingsheng Zhu, Junnan Li

Summary: Hierarchical clustering is a common unsupervised learning technique used to discover relationships in data sets. A novel Hierarchical Clustering algorithm with a Merging strategy based on Shared Subordinates (HCMSS) is proposed to overcome challenges like inaccuracy and time-consuming. Experiments show that HCMSS can effectively improve clustering accuracy and save time compared to state-of-the-art benchmarks.

APPLIED INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

Non-parameter clustering algorithm based on saturated neighborhood graph

Jinghui Zhang, Lijun Yang, Yong Zhang, Dongming Tang, Tao Liu

Summary: This paper introduces a non-parameter clustering algorithm based on saturated neighborhood graph (NPCSNG), which preprocesses the data set using mathematical methods and clusters the data using the characteristics of SNG adaptive clustering to achieve non-parameter clustering. The NPCSNG algorithm has the advantages of not requiring manual parameter setting, significantly improving clustering performance and model robustness, and adapting easily to data sets with complex manifold structure.

APPLIED SOFT COMPUTING (2022)

Article Computer Science, Information Systems

Energy-Efficient Joint Collaborative and Passive Beamforming for Intelligent-Reflecting-Surface-Assisted Wireless Sensor Networks

Tao Liu, Xiaomei Qu, Wenrong Tan, Ruihan Wen, Lijun Yang

Summary: This study proposes an energy-efficient joint collaborative and passive beamforming design for an Intelligent Reflecting Surface (IRS)-assisted Wireless Sensor Network (WSN), aiming to maximize the network lifetime. Through the development of a penalty dual-decomposition (PDD)-based algorithm, the joint optimization problem is efficiently solved, and a low computational complexity approximate iteration algorithm is proposed for the IRS's phase-shift optimization subproblem.

IEEE INTERNET OF THINGS JOURNAL (2023)

Article Computer Science, Artificial Intelligence

A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits

Hong Xie, Qiao Tang, Qingsheng Zhu

Summary: In this study, we propose an estimator based on the multiplier bootstrap technique to improve the application of Upper Confidence Bound (UCB) algorithms in Contextual Bandit (CB) problems. The estimator adaptsively converges to the ground truth and has theoretical guarantees on the convergence. Extensive experiments on synthetic and real-world datasets validate the superior performance of the proposed estimator.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Confidence-based and sample-reweighted test-time adaptation

Hao Yang, Min Wang, Zhengfei Yu, Hang Zhang, Jinshen Jiang, Yun Zhou

Summary: In this paper, a novel method called CSTTA is proposed for test time adaptation (TTA), which utilizes confidence-based optimization and sample reweighting to better utilize sample information. Extensive experiments demonstrate the effectiveness of the proposed method.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

A novel method for generating a canonical basis for decision implications based on object-induced three-way operators

Jin Liu, Ju-Sheng Mi, Dong-Yun Niu

Summary: This article focuses on a novel method for generating a canonical basis for decision implications based on object-induced operators (OE operators). The logic of decision implication based on OE operators is described, and a method for obtaining the canonical basis for decision implications is given. The completeness, nonredundancy, and optimality of the canonical basis are proven. Additionally, a method for generating true premises based on OE operators is proposed.

KNOWLEDGE-BASED SYSTEMS (2024)

Review Computer Science, Artificial Intelligence

Efficient utilization of pre-trained models: A review of sentiment analysis via prompt learning

Kun Bu, Yuanchao Liu, Xiaolong Ju

Summary: This paper discusses the importance of sentiment analysis and pre-trained models in natural language processing, and explores the application of prompt learning. The research shows that prompt learning is more suitable for sentiment analysis tasks and can achieve good performance.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

M-EDEM: A MNN-based Empirical Decomposition Ensemble Method for improved time series forecasting

Xiangjun Cai, Dagang Li

Summary: This paper presents a new decomposition mechanism based on learned decomposition mapping. By using a neural network to learn the relationship between original time series and decomposed results, the repetitive computation overhead during rolling decomposition is relieved. Additionally, extended mapping and partial decomposition methods are proposed to alleviate boundary effects on prediction performance. Comparative studies demonstrate that the proposed method outperforms existing RDEMs in terms of operation speed and prediction accuracy.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Privacy-preserving trust management method based on blockchain for cross-domain industrial IoT

Xu Wu, Yang Liu, Jie Tian, Yuanpeng Li

Summary: This paper proposes a blockchain-based privacy-preserving trust management architecture, which adopts federated learning to train task-specific trust models and utilizes differential privacy to protect device privacy. In addition, a game theory-based incentive mechanism and a parallel consensus protocol are proposed to improve the accuracy of trust computing and the efficiency of consensus.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

MV-ReID: 3D Multi-view Transformation Network for Occluded Person Re-Identification

Zaiyang Yu, Prayag Tiwari, Luyang Hou, Lusi Li, Weijun Li, Limin Jiang, Xin Ning

Summary: This study introduces a 3D view-based approach that effectively handles occlusions and leverages the geometric information of 3D objects. The proposed method achieves state-of-the-art results on occluded ReID tasks and exhibits competitive performance on holistic ReID tasks.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

City-scale continual neural semantic mapping with three-layer sampling and panoptic representation

Yongliang Shi, Runyi Yang, Zirui Wu, Pengfei Li, Caiyun Liu, Hao Zhao, Guyue Zhou

Summary: Neural implicit representations have gained attention due to their expressive, continuous, and compact properties. However, there is still a lack of research on city-scale continual implicit dense mapping based on sparse LiDAR input. In this study, a city-scale continual neural mapping system with a panoptic representation is developed, incorporating environment-level and instance-level modeling. A tailored three-layer sampling strategy and category-specific prior are proposed to address the challenges of representing geometric information in city-scale space and achieving high fidelity mapping of instances under incomplete observation.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

MDSSN: An end-to-end deep network on triangle mesh parameterization

Ruihan Hu, Zhi-Ri Tang, Rui Yang, Zhongjie Wang

Summary: Mesh data is crucial for 3D computer vision applications worldwide, but traditional deep learning frameworks have struggled with handling meshes. This paper proposes MDSSN, a simple mesh computation framework that models triangle meshes and represents their shape using face-based and edge-based Riemannian graphs. The framework incorporates end-to-end operators inspired by traditional deep learning frameworks, and includes dedicated modules for addressing challenges in mesh classification and segmentation tasks. Experimental results demonstrate that MDSSN outperforms other state-of-the-art approaches.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Semi-supervised learning with missing values imputation

Buliao Huang, Yunhui Zhu, Muhammad Usman, Huanhuan Chen

Summary: This paper proposes a novel semi-supervised conditional normalizing flow (SSCFlow) algorithm that combines unsupervised imputation and supervised classification. By estimating the conditional distribution of incomplete instances, SSCFlow facilitates imputation and classification simultaneously, addressing the issue of separated tasks ignoring data distribution and label information in traditional methods.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Emotion-and-knowledge grounded response generation in an open-domain dialogue setting

Deeksha Varshney, Asif Ekbal, Erik Cambria

Summary: This paper focuses on the neural-based interactive dialogue system that aims to engage and retain humans in long-lasting conversations. It proposes a new neural generative model that combines step-wise co-attention, self-attention-based transformer network, and an emotion classifier to control emotion and knowledge transfer during response generation. The results from quantitative, qualitative, and human evaluation show that the proposed models can generate natural and coherent sentences, capturing essential facts with significant improvement over emotional content.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

MvTS-library: An open library for deep multivariate time series forecasting

Junchen Ye, Weimiao Li, Zhixin Zhang, Tongyu Zhu, Leilei Sun, Bowen Du

Summary: Modeling multivariate time series has long been a topic of interest for scholars in various fields. This paper introduces MvTS, an open library based on Pytorch, which provides a unified framework for implementing and evaluating these models. Extensive experiments on public datasets demonstrate the effectiveness and universality of the models reproduced by MvTS.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

An adaptive hybrid mutated differential evolution feature selection method for low and high-dimensional medical datasets

Reham R. Mostafa, Ahmed M. Khedr, Zaher Al Aghbari, Imad Afyouni, Ibrahim Kamel, Naveed Ahmed

Summary: Feature selection is crucial in classification procedures, but it faces challenges in high-dimensional datasets. To overcome these challenges, this study proposes an Adaptive Hybrid-Mutated Differential Evolution method that incorporates the mechanics of the Spider Wasp Optimization algorithm and the concept of Enhanced Solution Quality. Experimental results demonstrate the effectiveness of the method in terms of accuracy and convergence speed, and it outperforms contemporary cutting-edge algorithms.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

TCM Model for improving track sequence classification in real scenarios with Multi-Feature Fusion and Transformer Block

Ti Xiang, Pin Lv, Liguo Sun, Yipu Yang, Jiuwu Hao

Summary: This paper introduces a Track Classification Model (TCM) based on marine radar, which can effectively recognize and classify shipping tracks. By using a feature extraction network with multi-feature fusion and a dataset production method to address missing labels, the classification accuracy is improved, resulting in successful engineering application in real scenarios.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Language model as an Annotator: Unsupervised context-aware quality phrase generation

Zhihao Zhang, Yuan Zuo, Chenghua Lin, Junjie Wu

Summary: This paper proposes a novel unsupervised context-aware quality phrase mining framework called LMPhrase, which is built upon large pre-trained language models. The framework mines quality phrases as silver labels using a parameter-free probing technique on the pre-trained language model BERT, and formalizes the phrase tagging task as a sequence generation problem by fine-tuning on the Sequence to-Sequence pre-trained language model BART. The results of extensive experiments show that LMPhrase consistently outperforms existing competitors in two different granularity phrase mining tasks.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Stochastic Gradient Descent for matrix completion: Hybrid parallelization on shared- and distributed-memory systems

Kemal Buyukkaya, M. Ozan Karsavuran, Cevdet Aykanat

Summary: The study aims to investigate the hybrid parallelization of the Stochastic Gradient Descent (SGD) algorithm for solving the matrix completion problem on a high-performance computing platform. A hybrid parallel decentralized SGD framework with asynchronous inter-process communication and a novel flexible partitioning scheme is proposed to achieve scalability up to hundreds of processors. Experimental results on real-world benchmark datasets show that the proposed algorithm achieves 6x higher throughput on sparse datasets compared to the state-of-the-art, while achieving comparable throughput on relatively dense datasets.

KNOWLEDGE-BASED SYSTEMS (2024)