期刊
PATTERN RECOGNITION
卷 114, 期 -, 页码 -出版社
ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2020.107526
关键词
Multi-label classification; Binary relevance; Nearest neighbor; Adaptive distance measure; Prototype weighting
The generalized Prototype Weighting (PW) scheme introduced in this paper supports various objective functions including F-measure, and is designed to significantly improve performance in multi-label classification by using gradient descent to specify parameters.
In multi-label classification, each instance is associated with a set of pre-specified labels. One common approach is to use Binary Relevance (BR) paradigm to learn each label by a base classifier separately. Use of k-Nearest Neighbor (kNN) as the base classifier (denoted as BRkNN) is a simple, descriptive and powerful approach. In binary relevance a highly imbalanced view of dataset is used. However, kNN is known to perform poorly on imbalanced data. One approach to deal with this is to define the distance function in a parametric form and use the training data to adjust the parameters (i.e. adjusting boundaries between classes) by optimizing a performance measure customized for imbalanced data e.g. F-measure. Prototype Weighting (PW) scheme presented in the literature (Paredes & Vidal, 2006) uses gradient descent to specify the parameters by minimizing the classification error-rate on training data. This paper presents a generalized version of PW. First, instead of minimizing the error-rate proposed in PW, the generalized PW supports also other objective functions that use elements of confusion matrix (including F-measure). Second, PW originally presented for 1NN is extended to the general case of kNN (i.e., k > = 1 ). For problems having highly overlapped classes, it is expected to perform better since a value of k > 1 produces smoother decision boundaries which in turn can improve generalization. In multi-label problems with many labels or problems with highly overlapped classes, the proposed generalized PW is expected to significantly improve the performance as it involves many decision boundaries. The performance of the proposed method has been compared with state-of-the-art methods in multi-label classification containing 6 lazy classifiers based on kNN. Experiments show that the proposed method significantly outperforms other methods. (c) 2020 Published by Elsevier Ltd.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据