4.2 Article

Active learning of constraints for weighted feature selection

Journal

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Volume 15, Issue 2, Pages 337-377

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s11634-020-00408-5

Keywords

Feature selection; Active learning; Pairwise constraint selection; Constraint propagation; Graph Laplacian; Uncertainty reduction; Matrix perturbation

Funding

  1. Agence universitaire de la Francophonie (AUF)
  2. University of the Littoral Opal Coast (ULCO) in France
  3. National Council For Scientific Research in Lebanon, ARCUS E2D2 project

Ask authors/readers for more resources

Pairwise constraints were initially suggested to enhance clustering algorithms and have more recently been explored for feature selection. This paper proposes a framework for actively selecting and propagating constraints, aiming to avoid using redundant, unnecessary, or harmful constraints and to increase supervision information while reducing human labor costs. Experimental results validate the proposal as prominent compared to other known feature selection methods.
Pairwise constraints, a cheaper kind of supervision information that does not need to reveal the class labels of data points, were initially suggested to enhance the performance of clustering algorithms. Recently, researchers were interested in using them for feature selection. However, in most current methods, pairwise constraints are provided passively and generated randomly over multiple algorithmic runs by which the results are averaged. This leads to the need of a large number of constraints that might be redundant, unnecessary, and under some circumstances even inimical to the algorithm's performance. It also masks the individual effect of each constraint set and introduces a human labor-cost burden. Therefore, in this paper, we suggest a framework for actively selecting and then propagating constraints for feature selection. For that, we benefit from the graph Laplacian that is defined on the similarity matrix. We assume that when a small perturbation of the similarity value between a data couple leads to a more well-separated cluster indicator based on the second eigenvector of the graph Laplacian, this couple is definitely expected to be a pairwise query of higher and more significant impact. Constraints propagation on the other side ensures increasing supervision information while decreasing the cost of human-labor. Finally, experimental results validated our proposal in comparison to other known feature selection methods and proved to be prominent.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available