4.7 Article

Robust hierarchical feature selection driven by data and knowledge

Journal

INFORMATION SCIENCES
Volume 551, Issue -, Pages 341-357

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2020.11.003

Keywords

Feature selection; Hierarchical classification; Multi-granularity; Data-driven; Knowledge-driven

Funding

  1. National Natural Science Foundation of China [61703196, 62006221]
  2. Natural Science Foundation of Fujian Province [2018J01549]
  3. President's Fund of Minnan Normal University [KJ19021]

Ask authors/readers for more resources

This paper proposes a hierarchical feature selection method driven by data and knowledge (HFSDK), which can produce compact feature subsets by splitting the original large label space. The method decomposes a large-scale classification task into small subclassification tasks with different granularities, driven by knowledge of the hierarchical class structure. Through a data-driven process, datasets are constructed from the bottom to the top, and robust and discriminative feature subsets are selected recursively for those subtasks.
Feature selection is facing great challenges brought by the enlarging label space and the inevitable noisy data. Flat feature selection methods fail to obtain a compact feature subset because of the numerous classes. In addition, these data-driven methods are sensitive to the data outliers. Fortunately, many practical tasks usually organize the classes by a hierarchical structure in a coarse-to-fine manner and can be solved by using the divide-and-conquer strategy. In this paper, we propose a hierarchical feature selection method driven by data and knowledge (HFSDK), which is robust to the data outliers and produces compact feature subsets by splitting the original large label space. Firstly, HFSDK decomposes a large-scale classification task into a group of small subclassification tasks with multiple granularities, which is driven by knowledge of the hierarchical class structure. Then, the corresponding datasets are constructed from the bottom to the top using the class labels of data, which is a data-driven process. Finally, robust and discriminative feature subsets are selected recursively for those subtasks by eliminating the data outliers and adding a semantic relation constraint. Experiments on six real-world datasets validate the superior performance of the proposed method. (C) 2020 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available