4.7 Article

Dependence-biased clustering for variable selection with random forests

Journal

PATTERN RECOGNITION
Volume 96, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2019.106980

Keywords

Variable selection; Random forest; Permutation importance; Regression; Classification; Clustering

Ask authors/readers for more resources

We introduce a method for selecting a small subset of informative, non-redundant predictors from a set of input variables, given an output variable. The core of this method is a novel measure of variable importance, which is an enhancement of the so-called conditional permutation importance (CPI). In CPI, the importance of an input variable is measured by the expected increase of a random forest (RF)'s prediction error when such variable is randomly permuted within certain groups of observations. While CPI obtains these groups from the stochastic recursive partitions that the RF carries out on the input space, our measure relies on a new approach that groups observations by means of a special form of clustering, which optimally leverages the structure of dependences existing between input variables. We show that our measure can be effectively used to recursively eliminate both unimportant and redundant input variables. Extensive experimental results illustrate the effectiveness of our method in comparison with many RF-based methods for variable selection. (C) 2019 Published by Elsevier Ltd.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available