4.7 Article

Incremental approaches for heterogeneous feature selection in dynamic ordered data

期刊

INFORMATION SCIENCES
卷 541, 期 -, 页码 475-501

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2020.06.051

关键词

Heterogeneous ordered decision system; Dominance-based neighborhood rough set; Feature selection; Matrix-based incremental algorithm

资金

  1. National Natural Science Foundation of China [61976182, 61572406, 61573292, 61602327, 61876157, 61976245]
  2. Key Program for International S&T Cooperation of Sichuan Province [2019YFH0097]
  3. Applied Basic Research Programs of Science and Technology Department of Sichuan Province [2019YJ0084]

向作者/读者索取更多资源

Feature selection can identify essential features and reduce the dimensionality of features, improving the classification ability of a learning model. In this study, we consider data with a preference-order relation, i.e., ordered data. In the big data era, ordered data contain noise and exhibit heterogeneous features (including numerical and categorical features) and dynamic characteristics (i.e., new objects are added and obsolete objects are removed with evolving time). The dominance-based neighborhood rough set (DNRS) considers the preference order relation of heterogeneous features and demonstrates fault tolerance; thus, it can be applied well to heterogeneous feature selection in ordered data. At present, DNRS-based heterogeneous feature selection methods are only used for static ordered data. For dynamic ordered data, existing heterogeneous feature selection approaches are highly time-consuming because they are required to recalculate knowledge from scratch when multiple objects vary. Motivated by this issue, we utilize a matrix-based method in this work to study incremental heterogeneous feature selection based on DNRS in dynamic ordered data. First, we define neighborhood dominance conditional entropy (NDCE) as the uncertainty measure and introduce a non-monotonic feature selection strategy based on this measure. Second, the neighborhood dominance relation matrix and its diagonal matrix are defined to calculate NDCE in matrix form. Third, the updating mechanisms of the diagonal matrix are studied when objects vary and used to update NDCE. Lastly, two incremental feature selection algorithms are proposed when multiple objects are added to or deleted from heterogeneous ordered data. Experiments are performed on public data sets. Experimental results verify that the proposed incremental algorithms are effective and efficient for updating feature subsets in dynamic heterogeneous ordered data. (C) 2020 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Review Computer Science, Artificial Intelligence

A Review on Deep Neural Networks for ICD Coding

Fei Teng, Yiming Liu, Tianrui Li, Yi Zhang, Shuangqing Li, Yue Zhao

Summary: The International Classification of Diseases (ICD) is widely used for categorizing physical conditions. Manual ICD coding is time-consuming and prone to errors. Therefore, researchers are focusing on using deep neural networks for ICD automatic coding.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Artificial Intelligence

Cross-Domain Knowledge Graph Chiasmal Embedding for Multi-Domain Item-Item Recommendation

Jia Liu, Wei Huang, Tianrui Li, Shenggong Ji, Junbo Zhang

Summary: This paper proposes a multi-domain item-item recommendation method based on cross-domain knowledge graph embedding, which addresses the sparsity and cold start problems faced by traditional recommender systems by analyzing the association between items within the same domain and the interaction between items across diverse domains with the aid of a rich information knowledge graph.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Artificial Intelligence

Self-supervised Discriminative Representation Learning by Fuzzy Autoencoder

Wenlu Yang, Hongjun Wang, Yinghui Zhang, Zehao Liu, Tianrui Li

Summary: Representation learning based on autoencoders has attracted great attention due to its potential to capture valuable latent information. However, traditional autoencoders only focus on minimal reconstruction error and neglect the discrimination of feature representation in machine learning tasks. To overcome this limitation, an enhanced self-supervised discriminative fuzzy autoencoder (FAE) is proposed, which explores information within data to guide unsupervised training and enhance feature discrimination.

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (2023)

Article Computer Science, Artificial Intelligence

Robust multi-label feature selection with shared coupled and dynamic graph regularization

Lingzhi Wang, Hongmei Chen, Bo Peng, Tianrui Li, Tengyu Yin

Summary: This study proposes a robust MFS method, which addresses the issues in multi-label feature selection using graph regularization and matrix factorization. The effectiveness of the algorithm is demonstrated through experiments.

APPLIED INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

A Possibilistic Information Fusion-Based Unsupervised Feature Selection Method Using Information Quality Measures

Pengfei Zhang, Tianrui Li, Zhong Yuan, Zhixuan Deng, Guoqiang Wang, Dexian Wang, Fan Zhang

Summary: This article proposes a novel information system based on possibility distribution, along with several defined measures of information quality. Based on this, an unsupervised feature selection algorithm is designed, which can effectively combine multiple possibilistic information while minimizing information uncertainty.

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2023)

Review Computer Science, Artificial Intelligence

Image inpainting based on deep learning: A review

Xiaobo Zhang, Donghai Zhai, Tianrui Li, Yuxin Zhou, Yang Lin

Summary: This article systematically summarizes and analyzes the literature on deep learning-based image inpainting. It reviews the research status of deep learning technology in the field of image inpainting over the past 15 years and deeply studies existing image restoration methods based on different neural network structures. The article also provides constructive suggestions for future development and discusses the urgent issues that need to be solved in the field.

INFORMATION FUSION (2023)

Article Computer Science, Information Systems

A Generalized Deep Learning Algorithm Based on NMF for Multi-View Clustering

Dexian Wang, Tianrui Li, Ping Deng, Jia Liu, Wei Huang, Fan Zhang

Summary: This paper proposes a generalized deep learning multi-view clustering (GDLMC) algorithm based on non-negative matrix factorization (NMF), which improves the clustering performance of multi-view clustering by addressing the issues of weak feature extraction, slow convergence speed, and low accuracy in NMF based algorithms. The GDLMC algorithm utilizes decoupled and non-negatively restricted matrix elements, updates the elements using stochastic gradient descent with learning rate guidance, and combines generalized weights and biases with activation functions to construct generalized deep learning (GDL), which is then used to learn low-dimensional matrices for each view and a consensus matrix. Experimental results on four public datasets demonstrate the significant advantages of GDLMC.

IEEE TRANSACTIONS ON BIG DATA (2023)

Article Computer Science, Theory & Methods

RHDOFS: A Distributed Online Algorithm Towards Scalable Streaming Feature Selection

Chuan Luo, Sizhao Wang, Tianrui Li, Hongmei Chen, Jiancheng Lv, Zhang Yi

Summary: This article introduces a Rough Hypercuboid based Distributed Online Feature Selection (RHDOFS) method to address the challenges of Volume and Velocity in Big Data. It proposes a novel integrated feature evaluation criterion by exploring class separability in the boundary region. An efficient online feature selection method is developed for streaming features, and a parallel optimization mechanism is employed to accelerate the implementation. The algorithm is implemented on Apache Spark and demonstrates superior performance in comparison to other online feature selection algorithms.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Micro-Supervised Disturbance Learning: A Perspective of Representation Probability Distribution

Jielei Chu, Jing Liu, Hongjun Wang, Hua Meng, Zhiguo Gong, Tianrui Li

Summary: Based on the idea of small perturbation, a representation learning model based on probability distribution is proposed, and two variant models, Micro-DGRBM and Micro-DRBM, are introduced. The KL divergence of SPI is minimized within the same cluster to promote the similarity of probability distributions, while it is maximized across different clusters to enforce the dissimilarity in CD learning. Experimental results demonstrate that the proposed deep Micro-DL architecture outperforms the baseline method and other shallow models and deep frameworks for clustering.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Medicine, Research & Experimental

Development of a nomogram for severe influenza in previously healthy children: a retrospective cohort study

Wenyun Huang, Wensi Niu, Hongmei Chen, Wujun Jiang, Yanbing Fu, Xiuxiu Li, Minglei Li, Jun Hua, Chunxia Hu

Summary: We aimed to develop a nomogram to predict the risk of severe influenza in previously healthy children. A total of 1135 children infected with influenza in a retrospective cohort study were included. Risk factors were identified through logistic regression analysis and a nomogram was established. The nomogram showed good predictive ability in both the training and validation cohorts.

JOURNAL OF INTERNATIONAL MEDICAL RESEARCH (2023)

Article Computer Science, Artificial Intelligence

Dynamic graph-based attribute reduction approach with fuzzy rough sets

Lei Ma, Chuan Luo, Tianrui Li, Hongmei Chen, Dun Liu

Summary: With the accumulation of interesting data in various application fields, incremental datasets are becoming more common. However, selecting informative attributes from dynamically changing datasets poses challenges. Therefore, an incremental processing mechanism is desired to update the attribute reducts efficiently. In this paper, a novel dynamic graph-based fuzzy rough attribute reduction approach is proposed to handle the maintenance of fuzzy rough attribute reduction in dynamic data, which outperforms existing methods in terms of speed and quality preservation.

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS (2023)

Article Computer Science, Artificial Intelligence

Robust unsupervised feature selection via dual space latent representation learning and adaptive structure learning

Weiyi Li, Hongmei Chen, Tianrui Li, Tengyu Yin, Chuan Luo

Summary: In this paper, a robust unsupervised feature selection method, DSLRAS, is proposed, which can capture the correlation between features and the correlation between samples through latent representation learning in both feature space and data space. Adaptive graph learning is used to maintain the local geometric structure of data more accurately, and a regularization term is added to guarantee row-sparsity and achieve better results.

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS (2023)

Article Computer Science, Artificial Intelligence

Robust multi-view clustering in latent low-rank space with discrepancy induction

Bo Xiong, Hongmei Chen, Tianrui Li, Xiaoling Yang

Summary: Multi-view graph clustering has attracted extensive research attention due to its ability to capture consistent and complementary information between views. However, multi-view data are mostly high-dimensional and may contain redundant and irrelevant features. In addition, the original data are often contaminated by noise and outliers, affecting the reliability of the learned affinity matrix. This study proposes a robust multi-view clustering model that combines low-dimensional and low-rank latent space learning, self-representation learning, and multi-view discrepancy induction fusion. Experimental results on benchmark datasets show that the proposed model outperforms state-of-the-art comparison models in terms of robustness and clustering performance.

APPLIED INTELLIGENCE (2023)

Article Infectious Diseases

Early application of metagenomics next-generation sequencing may significantly reduce unnecessary consumption of antibiotics in patients with fever of unknown origin

Hongmei Chen, Mingze Tang, Lemeng Yao, Di Zhang, Yubin Zhang, Yingren Zhao, Han Xia, Tianyan Chen, Jie Zheng

Summary: mNGS is a novel nucleic acid method that can detect unknown and difficult pathogenic microorganisms. Its application in the etiological diagnosis of fever of unknown origin (FUO) is not well studied. This study aimed to comprehensively assess the value of mNGS in diagnosing FUO and investigate its impact on diagnosis time, hospitalization days, antibiotic consumption, and cost.

BMC INFECTIOUS DISEASES (2023)

Article Computer Science, Artificial Intelligence

Multiscale Fuzzy Entropy-Based Feature Selection

Zhihong Wang, Hongmei Chen, Zhong Yuan, Jihong Wan, Tianrui Li

Summary: This paper introduces a feature selection method based on multiscale fuzzy entropy, which improves the effectiveness of feature selection by fusing granule information at different scales.

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2023)

Article Computer Science, Information Systems

A consensus model considers managing manipulative and overconfident behaviours in large-scale group decision-making

Xia Liang, Jie Guo, Peide Liu

Summary: This paper investigates a novel consensus model based on social networks to manage manipulative and overconfident behaviors in large-scale group decision-making. By proposing a novel clustering model and improved methods, the consensus reaching is effectively facilitated. The feedback mechanism and management approach are employed to handle decision makers' behaviors. Simulation experiments and comparative analysis demonstrate the effectiveness of the model.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

CGN: Class gradient network for the construction of adversarial samples

Xiang Li, Haiwang Guo, Xinyang Deng, Wen Jiang

Summary: This paper proposes a method based on class gradient networks for generating high-quality adversarial samples. By introducing a high-level class gradient matrix and combining classification loss and perturbation loss, the method demonstrates superiority in the transferability of adversarial samples on targeted attacks.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Distinguishing latent interaction types from implicit feedbacks for recommendation

Lingyun Lu, Bang Wang, Zizhuo Zhang, Shenghao Liu

Summary: Many recommendation algorithms only rely on implicit feedbacks due to privacy concerns. However, the encoding of interaction types is often ignored. This paper proposes a relation-aware neural model that classifies implicit feedbacks by encoding edges, thereby enhancing recommendation performance.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Proximity-based density description with regularized reconstruction algorithm for anomaly detection

Jaehong Yu, Hyungrok Do

Summary: This study discusses unsupervised anomaly detection using one-class classification, which determines whether a new instance belongs to the target class by constructing a decision boundary. The proposed method uses a proximity-based density description and a regularized reconstruction algorithm to overcome the limitations of existing one-class classification methods. Experimental results demonstrate the superior performance of the proposed algorithm.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Non-iterative border-peeling clustering algorithm based on swap strategy

Hui Tu, Shifei Ding, Xiao Xu, Haiwei Hou, Chao Li, Ling Ding

Summary: Border-Peeling algorithm is a density-based clustering algorithm, but its complexity and issues on unbalanced datasets restrict its application. This paper proposes a non-iterative border-peeling clustering algorithm, which improves the clustering performance by distinguishing and associating core points and border points.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A two-stage denoising framework for zero-shot learning with noisy labels

Long Tang, Pan Zhao, Zhigeng Pan, Xingxing Duan, Panos M. Pardalos

Summary: In this work, a two-stage denoising framework (TSDF) is proposed for zero-shot learning (ZSL) to address the issue of noisy labels. The framework includes a tailored loss function to remove suspected noisy-label instances and a ramp-style loss function to reduce the negative impact of remaining noisy labels. In addition, a dynamic screening strategy (DSS) is developed to efficiently handle the nonconvexity of the ramp-style loss.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Selection of a viable blockchain service provider for data management within the internet of medical things: An MCDM approach to Indian healthcare

Raghunathan Krishankumar, Sundararajan Dhruva, Kattur S. Ravichandran, Samarjit Kar

Summary: Health 4.0 is gaining global attention for better healthcare through digital technologies. This study proposes a new decision-making framework for selecting viable blockchain service providers in the Internet of Medical Things (IoMT). The framework addresses the limitations in previous studies and demonstrates its applicability in the Indian healthcare sector. The results show the top ranking BSPs, the importance of various criteria, and the effectiveness of the developed model.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Q-learning with heterogeneous update strategy

Tao Tan, Hong Xie, Liang Feng

Summary: This paper proposes a heterogeneous update idea and designs HetUp Q-learning algorithm to enlarge the normalized gap by overestimating the Q-value corresponding to the optimal action and underestimating the Q-value corresponding to the other actions. To address the limitation, a softmax strategy is applied to estimate the optimal action, resulting in HetUpSoft Q-learning and HetUpSoft DQN. Extensive experimental results show significant improvements over SOTA baselines.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Dyformer: A dynamic transformer-based architecture for multivariate time series classification

Chao Yang, Xianzhi Wang, Lina Yao, Guodong Long, Guandong Xu

Summary: This paper proposes a dynamic transformer-based architecture called Dyformer for multivariate time series classification. Dyformer captures multi-scale features through hierarchical pooling and adaptive learning strategies, and improves model performance by introducing feature-map-wise attention mechanisms and a joint loss function.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

ESSENT: an arithmetic optimization algorithm with enhanced scatter search strategy for automated test case generation

Xiguang Li, Baolu Feng, Yunhe Sun, Ammar Hawbani, Saeed Hammod Alsamhi, Liang Zhao

Summary: This paper proposes an enhanced scatter search strategy, using opposition-based learning, to solve the problem of automated test case generation based on path coverage (ATCG-PC). The proposed ESSENT algorithm selects the path with the lowest path entropy among the uncovered paths as the target path and generates new test cases to cover the target path by modifying the dimensions of existing test cases. Experimental results show that the ESSENT algorithm outperforms other state-of-the-art algorithms, achieving maximum path coverage with fewer test cases.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

An attention based approach for automated account linkage in federated identity management

Shirin Dabbaghi Varnosfaderani, Piotr Kasprzak, Aytaj Badirova, Ralph Krimmel, Christof Pohl, Ramin Yahyapour

Summary: Linking digital accounts belonging to the same user is crucial for security, user satisfaction, and next-generation service development. However, research on account linkage is mainly focused on social networks, and there is a lack of studies in other domains. To address this, we propose SmartSSO, a framework that automates the account linkage process by analyzing user routines and behavior during login processes. Our experiments on a large dataset show that SmartSSO achieves over 98% accuracy in hit-precision.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A memetic algorithm with fuzzy-based population control for the joint order batching and picker routing problem

Renchao Wu, Jianjun He, Xin Li, Zuguo Chen

Summary: This paper proposes a memetic algorithm with fuzzy-based population control (MA-FPC) to solve the joint order batching and picker routing problem (JOBPRP). The algorithm incorporates batch exchange crossover and a two-level local improvement procedure. Experimental results show that MA-FPC outperforms existing algorithms in terms of solution quality.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Refining one-class representation: A unified transformer for unsupervised time-series anomaly detection

Guoxiang Zhong, Fagui Liu, Jun Jiang, Bin Wang, C. L. Philip Chen

Summary: In this study, we propose the AMFormer framework to address the problem of mixed normal and anomaly samples in deep unsupervised time-series anomaly detection. By refining the one-class representation and introducing the masked operation mechanism and cost sensitive learning theory, our approach significantly improves anomaly detection performance.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A data-driven optimisation method for a class of problems with redundant variables and indefinite objective functions

Jin Zhou, Kang Zhou, Gexiang Zhang, Ferrante Neri, Wangyang Shen, Weiping Jin

Summary: In this paper, the authors focus on the issue of multi-objective optimisation problems with redundant variables and indefinite objective functions (MOPRVIF) in practical problem-solving. They propose a dual data-driven method for solving this problem, which consists of eliminating redundant variables, constructing objective functions, selecting evolution operators, and using a multi-objective evolutionary algorithm. The experiments conducted on two different problem domains demonstrate the effectiveness, practicality, and scalability of the proposed method.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A Monte Carlo fuzzy logistic regression framework against imbalance and separation

Georgios Charizanos, Haydar Demirhan, Duygu Icen

Summary: This article proposes a new fuzzy logistic regression framework that addresses the problems of separation and imbalance while maintaining the interpretability of classical logistic regression. By fuzzifying binary variables and classifying subjects based on a fuzzy threshold, the framework demonstrates superior performance on imbalanced datasets.

INFORMATION SCIENCES (2024)