4.7 Article

Nearest neighbor regression in the presence of bad hubs

期刊

KNOWLEDGE-BASED SYSTEMS
卷 86, 期 -, 页码 250-260

出版社

ELSEVIER SCIENCE BV
DOI: 10.1016/j.knosys.2015.06.010

关键词

Nearest neighbor regression; Hubs; Intrinsic dimensionality; Machine learning

资金

  1. Hungarian Scientific Research Fund [OTKA 111710 PD]
  2. Janos Bolyai Research Scholarship of Hungarian Academy of Sciences
  3. FUTURICT [TAMOP-4.2.2.C-11/1/KONV]
  4. [39859]

向作者/读者索取更多资源

Prediction on a numeric scale, i.e., regression, is one of the most prominent machine learning tasks with various applications in finance, medicine, social and natural sciences. Due to its simplicity, theoretical performance guarantees and successful real-world applications, one of the most popular regression techniques is the k nearest neighbor regression. However, k nearest neighbor approaches are affected by the presence of bad hubs, a recently observed phenomenon according to which some of the instances are similar to surprisingly many other instances and have a detrimental effect on the overall prediction performance. This paper is the first to study bad hubs in context of regression. We propose hubness-aware nearest neighbor regression schemes. We evaluate our approaches on publicly available real-world data-sets from various domains. Our results show that the proposed approaches outperform various other regressions schemes such as kNN regression, regression trees and neural networks. We also evaluate the proposed approaches in the presence of label noise because tolerance to noise is one of the most relevant aspects from the point of view of real-world applications. In particular, we perform experiments under the assumption of conventional Gaussian label noise and an adapted version of the recently proposed hubness-proportional random label noise. (C) 2015 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据