4.7 Article

Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems

期刊

INFORMATION SCIENCES
卷 537, 期 -, 页码 401-424

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2020.05.102

关键词

Neighborhood rough sets; Multilabel feature selection; ReliefF; Neighborhood mutual information; Multilabel classification

资金

  1. National Natural Science Foundation of China [61772176, 61976082, 61976120, 61672332]
  2. Plan of Scientific Innovation Talent of Henan Province [184100510003]
  3. Young Scholar Program of Henan Province [2017GGJS041]
  4. Natural Science Foundation of Henan Province [162300410178]
  5. Natural Science Foundation of Jiangsu Province [BK20191445]
  6. Six Talent Peaks Project of Jiangsu Province [XYDXXJS-048]
  7. Qing Lan Project of Jiangsu Province

向作者/读者索取更多资源

Feature selection as an essential preprocessing step in multilabel classification has been widely researched. Due to the diversity and complexity of multilabel datasets, some feature selection methods are unstable and yield low predictive accuracy. To address these issues, this paper presents a novel multilabel feature selection method using multilabel ReliefF (ML-ReliefF) and neighborhood mutual information in multilabel neighborhood decision systems. First, to solve the problem of the few available randomly selected samples when searching the nearest samples in ReliefF, the coefficient of difference and the average distance among the nearest similar and heterogeneous samples are introduced to evaluate the differences among the samples, and then the average differences among the similar or heterogeneous samples are defined. Using the Jaccard correlation coefficient, a new formula for updating feature weights is presented. Second, the margin of the sample is studied to granulate all samples under each label, and the concept of the neighborhood is given. By combining algebra with information views, some neighborhood entropy-based uncertainty measures for multilabel classification are investigated, and new neighborhood mutual information is proposed. Furthermore, an optimization objective function is constructed to evaluate the candidate features in multilabel neighborhood decision systems, all the properties are discussed, and the relationships of these measures are established. Finally, an improved ML-ReliefF algorithm is designed for preliminarily eliminating unrelated features to decrease the computational complexity for multilabel classification, and a heuristic forward multilabel feature selection algorithm is developed to remove redundant features and improve classification performance. Experimental results conducted on thirteen multilabel datasets to verify the effectiveness of the proposed algorithms in multilabel neighborhood decision systems are compared with representative methods. (C) 2020 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Physics, Multidisciplinary

LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications

Shiguang Zhang, Ting Zhou, Lin Sun, Wei Wang, Baofang Chang

ENTROPY (2020)

Article Computer Science, Artificial Intelligence

Three-way decision models based on multigranulation support intuitionistic fuzzy rough sets

Zhan'ao Xue, Liping Zhao, Lin Sun, Min Zhang, Tianyu Xue

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING (2020)

Article Computer Science, Artificial Intelligence

Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets

Lin Sun, Lanying Wang, Weiping Ding, Yuhua Qian, Jiucheng Xu

Summary: The article introduces a feature selection method based on FNMRS for preprocessing data and improving its classification performance in heterogeneous data sets. The approach constructs uncertainty measures using fuzzy neighborhood rough sets and neighborhood multigranulation rough sets, and provides optimistic and pessimistic FNMRS models along with fuzzy neighborhood entropy-based uncertainty measures. Additionally, the Fisher score model is utilized to reduce the complexity of high-dimensional data sets by deleting irrelevant features and a forward feature selection algorithm is presented to enhance the performance of heterogeneous data classification.

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2021)

Article Computer Science, Artificial Intelligence

Feature selection using binary monarch butterfly optimization

Lin Sun, Shanshan Si, Jing Zhao, Jiucheng Xu, Yaojin Lin, Zhiying Lv

Summary: This paper proposes two mechanisms for improving the binary Monarch Butterfly Optimization (BMBO) algorithm in metaheuristic feature selection. The new mechanisms include the introduction of transfer functions to convert continuous space into binary and the design of two BMBO models based on these transfer functions. Additionally, a new step length parameter is proposed to update the position of the butterfly, and local disturbance and group division strategies are added to prevent the algorithm from falling into local optima. Experimental results show that the designed algorithm has great classification efficiency compared to other related technologies.

APPLIED INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Feature Selection With Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum Redundancy

Lin Sun, Tengyu Yin, Weiping Ding, Yuhua Qian, Jiucheng Xu

Summary: This article presents a feature selection method based on multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy for multilabel data with missing labels. Experiments verify the effectiveness of the method in recovering missing labels and selecting significant features.

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels

Lin Sun, Tianxiang Wang, Weiping Ding, Jiucheng Xu, Anhui Tan

Summary: This paper presents a neighborhood-based multilabel classification method for dealing with missing labels in real-world multilabel data. By defining the neighborhood radius, restoring missing feature values, and investigating the fuzzy similarity relationship among samples, the classification performance of data with missing labels is improved.

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2022)

Article Computer Science, Information Systems

AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data

Lin Sun, Mengmeng Li, Weiping Ding, En Zhang, Xiaoxia Mu, Jiucheng Xu

Summary: This paper proposes a novel adaptive fuzzy neighborhood-based feature selection method for imbalanced data with adaptive synthetic over-sampling. It addresses the limitations of manually setting fuzzy neighborhood radius and potential ignorance of boundary regions, and achieves effective classification results.

INFORMATION SCIENCES (2022)

Article Computer Science, Artificial Intelligence

TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification

Lin Sun, Xinya Wang, Weiping Ding, Jiucheng Xu

Summary: This study developed a two-stage feature reduction model using fuzzy neighborhood rough sets and the binary whale optimization algorithm to address challenges in imbalanced data classification. Experimental results demonstrated the efficiency of the proposed algorithm for two-class and multiclass datasets.

KNOWLEDGE-BASED SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

AMFSA: Adaptive fuzzy neighborhood-based multilabel feature selection with ant colony optimization

Lin Sun, Yusheng Chen, Weiping Ding, Jiucheng Xu, Yuanyuan Ma

Summary: This article proposes a novel adaptive fuzzy neighborhood-based multilabel feature subset selection approach with ant colony optimization (ACO) for multilabel classification. It addresses the issue of ignoring correlations among labels and the manual setting of neighborhood radius in existing feature selection models. The approach combines feature cosine similarity and label Jaccard similarity to effectively reflect overall similarity between samples, and utilizes dynamic adjustment coefficients to control label similarity importance. Experimental results demonstrate the effectiveness of the proposed algorithm in achieving excellent feature subset for multilabel classification.

APPLIED SOFT COMPUTING (2023)

Article Computer Science, Artificial Intelligence

Multiobjective sparrow search feature selection with sparrow ranking and preference information and its applications for high-dimensional data

Lin Sun, Shanshan Si, Weiping Ding, Xinya Wang, Jiucheng Xu

Summary: This paper proposes a multiobjective sparrow search feature selection approach to address the challenges of balancing convergence and diversity in nondominated solutions. The approach combines the updating formula of observers with the mutualism phase of the symbiotic organisms search algorithm to improve the search ability. The paper also introduces sparrow ranking, feature ranking, and a preference information-based mutation algorithm to enhance the diversity of solutions and guide the population towards better solutions.

APPLIED SOFT COMPUTING (2023)

Article Computer Science, Artificial Intelligence

Partial Multilabel Learning Using Fuzzy Neighborhood-Based Ball Clustering and Kernel Extreme Learning Machine

Lin Sun, Tianxiang Wang, Weiping Ding, Jiucheng Xu

Summary: This study develops a novel Partial Multilabel Learning (PML) model that addresses some issues in traditional PML models by introducing fuzzy neighborhood-based ball clustering and kernel extreme learning machine (KELM). The model preprocesses the data with ball k-means clustering, designs a new ball clustering model, develops the particle-ball fusion strategy, studies fuzzy membership functions and label enhancement, and constructs a nonsmooth convex objective function. Experimental results on 14 datasets confirm the effectiveness of the proposed algorithm.

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

TFSFB: Two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data

Lin Sun, Shanshan Si, Weiping Ding, Xinya Wang, Jiucheng Xu

Summary: This study proposes a new feature subset selection scheme to deal with imbalanced data by fusing fuzzy multi-neighborhood rough set (FMRS) and binary whale optimization algorithm (BWOA). The method evaluates the distribution of different features using the standard deviation coefficient and constructs a fuzzy multi-neighborhood radius set. It also introduces fuzzy multi-neighborhood granule and fuzzy mem-bership degree to establish FMRS, and develops a feature significance measure to balance the properties and influences of different features. Experimental results demonstrate the effectiveness of the proposed algorithm for classification of imbalanced data.

INFORMATION FUSION (2023)

Article Computer Science, Information Systems

Twin Least Squares Support Vector Regression of Heteroscedastic Gaussian Noise Model

Shiguang Zhang, Chao Liu, Ting Zhou, Lin Sun

IEEE ACCESS (2020)

Article Computer Science, Information Systems

Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification

Enhui Shi, Lin Sun, Jiucheng Xu, Shiguang Zhang

IEEE ACCESS (2020)

Article Computer Science, Information Systems

A consensus model considers managing manipulative and overconfident behaviours in large-scale group decision-making

Xia Liang, Jie Guo, Peide Liu

Summary: This paper investigates a novel consensus model based on social networks to manage manipulative and overconfident behaviors in large-scale group decision-making. By proposing a novel clustering model and improved methods, the consensus reaching is effectively facilitated. The feedback mechanism and management approach are employed to handle decision makers' behaviors. Simulation experiments and comparative analysis demonstrate the effectiveness of the model.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

CGN: Class gradient network for the construction of adversarial samples

Xiang Li, Haiwang Guo, Xinyang Deng, Wen Jiang

Summary: This paper proposes a method based on class gradient networks for generating high-quality adversarial samples. By introducing a high-level class gradient matrix and combining classification loss and perturbation loss, the method demonstrates superiority in the transferability of adversarial samples on targeted attacks.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Distinguishing latent interaction types from implicit feedbacks for recommendation

Lingyun Lu, Bang Wang, Zizhuo Zhang, Shenghao Liu

Summary: Many recommendation algorithms only rely on implicit feedbacks due to privacy concerns. However, the encoding of interaction types is often ignored. This paper proposes a relation-aware neural model that classifies implicit feedbacks by encoding edges, thereby enhancing recommendation performance.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Proximity-based density description with regularized reconstruction algorithm for anomaly detection

Jaehong Yu, Hyungrok Do

Summary: This study discusses unsupervised anomaly detection using one-class classification, which determines whether a new instance belongs to the target class by constructing a decision boundary. The proposed method uses a proximity-based density description and a regularized reconstruction algorithm to overcome the limitations of existing one-class classification methods. Experimental results demonstrate the superior performance of the proposed algorithm.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Non-iterative border-peeling clustering algorithm based on swap strategy

Hui Tu, Shifei Ding, Xiao Xu, Haiwei Hou, Chao Li, Ling Ding

Summary: Border-Peeling algorithm is a density-based clustering algorithm, but its complexity and issues on unbalanced datasets restrict its application. This paper proposes a non-iterative border-peeling clustering algorithm, which improves the clustering performance by distinguishing and associating core points and border points.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A two-stage denoising framework for zero-shot learning with noisy labels

Long Tang, Pan Zhao, Zhigeng Pan, Xingxing Duan, Panos M. Pardalos

Summary: In this work, a two-stage denoising framework (TSDF) is proposed for zero-shot learning (ZSL) to address the issue of noisy labels. The framework includes a tailored loss function to remove suspected noisy-label instances and a ramp-style loss function to reduce the negative impact of remaining noisy labels. In addition, a dynamic screening strategy (DSS) is developed to efficiently handle the nonconvexity of the ramp-style loss.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Selection of a viable blockchain service provider for data management within the internet of medical things: An MCDM approach to Indian healthcare

Raghunathan Krishankumar, Sundararajan Dhruva, Kattur S. Ravichandran, Samarjit Kar

Summary: Health 4.0 is gaining global attention for better healthcare through digital technologies. This study proposes a new decision-making framework for selecting viable blockchain service providers in the Internet of Medical Things (IoMT). The framework addresses the limitations in previous studies and demonstrates its applicability in the Indian healthcare sector. The results show the top ranking BSPs, the importance of various criteria, and the effectiveness of the developed model.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Q-learning with heterogeneous update strategy

Tao Tan, Hong Xie, Liang Feng

Summary: This paper proposes a heterogeneous update idea and designs HetUp Q-learning algorithm to enlarge the normalized gap by overestimating the Q-value corresponding to the optimal action and underestimating the Q-value corresponding to the other actions. To address the limitation, a softmax strategy is applied to estimate the optimal action, resulting in HetUpSoft Q-learning and HetUpSoft DQN. Extensive experimental results show significant improvements over SOTA baselines.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Dyformer: A dynamic transformer-based architecture for multivariate time series classification

Chao Yang, Xianzhi Wang, Lina Yao, Guodong Long, Guandong Xu

Summary: This paper proposes a dynamic transformer-based architecture called Dyformer for multivariate time series classification. Dyformer captures multi-scale features through hierarchical pooling and adaptive learning strategies, and improves model performance by introducing feature-map-wise attention mechanisms and a joint loss function.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

ESSENT: an arithmetic optimization algorithm with enhanced scatter search strategy for automated test case generation

Xiguang Li, Baolu Feng, Yunhe Sun, Ammar Hawbani, Saeed Hammod Alsamhi, Liang Zhao

Summary: This paper proposes an enhanced scatter search strategy, using opposition-based learning, to solve the problem of automated test case generation based on path coverage (ATCG-PC). The proposed ESSENT algorithm selects the path with the lowest path entropy among the uncovered paths as the target path and generates new test cases to cover the target path by modifying the dimensions of existing test cases. Experimental results show that the ESSENT algorithm outperforms other state-of-the-art algorithms, achieving maximum path coverage with fewer test cases.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

An attention based approach for automated account linkage in federated identity management

Shirin Dabbaghi Varnosfaderani, Piotr Kasprzak, Aytaj Badirova, Ralph Krimmel, Christof Pohl, Ramin Yahyapour

Summary: Linking digital accounts belonging to the same user is crucial for security, user satisfaction, and next-generation service development. However, research on account linkage is mainly focused on social networks, and there is a lack of studies in other domains. To address this, we propose SmartSSO, a framework that automates the account linkage process by analyzing user routines and behavior during login processes. Our experiments on a large dataset show that SmartSSO achieves over 98% accuracy in hit-precision.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A memetic algorithm with fuzzy-based population control for the joint order batching and picker routing problem

Renchao Wu, Jianjun He, Xin Li, Zuguo Chen

Summary: This paper proposes a memetic algorithm with fuzzy-based population control (MA-FPC) to solve the joint order batching and picker routing problem (JOBPRP). The algorithm incorporates batch exchange crossover and a two-level local improvement procedure. Experimental results show that MA-FPC outperforms existing algorithms in terms of solution quality.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Refining one-class representation: A unified transformer for unsupervised time-series anomaly detection

Guoxiang Zhong, Fagui Liu, Jun Jiang, Bin Wang, C. L. Philip Chen

Summary: In this study, we propose the AMFormer framework to address the problem of mixed normal and anomaly samples in deep unsupervised time-series anomaly detection. By refining the one-class representation and introducing the masked operation mechanism and cost sensitive learning theory, our approach significantly improves anomaly detection performance.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A data-driven optimisation method for a class of problems with redundant variables and indefinite objective functions

Jin Zhou, Kang Zhou, Gexiang Zhang, Ferrante Neri, Wangyang Shen, Weiping Jin

Summary: In this paper, the authors focus on the issue of multi-objective optimisation problems with redundant variables and indefinite objective functions (MOPRVIF) in practical problem-solving. They propose a dual data-driven method for solving this problem, which consists of eliminating redundant variables, constructing objective functions, selecting evolution operators, and using a multi-objective evolutionary algorithm. The experiments conducted on two different problem domains demonstrate the effectiveness, practicality, and scalability of the proposed method.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A Monte Carlo fuzzy logistic regression framework against imbalance and separation

Georgios Charizanos, Haydar Demirhan, Duygu Icen

Summary: This article proposes a new fuzzy logistic regression framework that addresses the problems of separation and imbalance while maintaining the interpretability of classical logistic regression. By fuzzifying binary variables and classifying subjects based on a fuzzy threshold, the framework demonstrates superior performance on imbalanced datasets.

INFORMATION SCIENCES (2024)