期刊
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS
卷 42, 期 6, 页码 1828-1841出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMCC.2012.2227471
关键词
AdaBoost; dynamic weighting scheme; overfitting; statistical analysis; variable loss function
类别
资金
- National Natural Science Foundation of China [61203176, 61174161]
- Key Research Project of Fujian Province of China [2009H0044]
- Fundamental Research Funds for the Central Universities in China, Xiamen University [2011121047201112G018, CXB2011035]
- Natural Sciences and Engineering Research Council of Canada
AdaBoost is a method to improve a given learning algorithm's classification accuracy by combining its hypotheses. Adaptivity, one of the significant advantages of AdaBoost, makes AdaBoost maximize the smallest margin so that AdaBoost has good generalization ability. However, when the samples with large negative margins are noisy or atypical, the maximized margin is actually a hard margin. The adaptive feature makes AdaBoost sensitive to the sampling fluctuations, and prone to overfitting. Therefore, the traditional schemes prevent AdaBoost from overfitting by heavily damping the influences of samples with large negative margins. However, the samples with large negative margins are not always noisy or atypical; thus, the traditional schemes of preventing overfitting may not be reasonable. In order to learn a classifier with high generalization performance and prevent overfitting, it is necessary to perform statistical analysis for the margins of training samples. Herein, Hoeffding inequality is adopted as a statistical tool to divide training samples into reliable samples and temporary unreliable samples. A new boosting algorithm, which is named DAdaBoost, is introduced to deal with reliable samples and temporary unreliable samples separately. Since DAdaBoost adjusts weighting scheme dynamically, the loss function of DAdaBoost is not fixed. In fact, it is a series of nonconvex functions that gradually approach the 0-1 function as the algorithm evolves. By defining a virtual classifier, the dynamic adjusted weighting scheme is well unified into the progress of DAdaBoost, and the upper bound of training error is deduced. The experiments on both synthetic and real world data show that DAdaBoost has many merits. Based on the experiments, we conclude that DAdaBoost can effectively prevent AdaBoost from overfitting.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据