☆ 4.7 Article

Deep Neural Network Self-Distillation Exploiting Data Representation Invariance

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Volume 33, Issue 1, Pages 257-269

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2020.3027634

Keywords

Training; Nonlinear distortion; Data models; Neural networks; Knowledge engineering; Network architecture; Generalization error; network compression; representation invariance; self-distillation (SD)

Funding

Major Project for New Generation of Artificial Intelligence (AI) [2018AAA0100400]
National Natural Science Foundation of China (NSFC) [61836014, 61721004]
Ministry of Science and Technology of China

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article proposes an elegant self-distillation mechanism to directly obtain high-accuracy models without the need for an assistive model. It learns data representation invariance and effectively reduces generalization errors for various network architectures, surpassing existing model distillation methods with little extra training efforts.

To harvest small networks with high accuracies, most existing methods mainly utilize compression techniques such as low-rank decomposition and pruning to compress a trained large model into a small network or transfer knowledge from a powerful large model (teacher) to a small network (student). Despite their success in generating small models of high performance, the dependence of accompanying assistive models complicates the training process and increases memory and time cost. In this article, we propose an elegant self-distillation (SD) mechanism to obtain high-accuracy models directly without going through an assistive model. Inspired by the invariant recognition in the human vision system, different distorted instances of the same input should possess similar high-level data representations. Thus, we can learn data representation invariance between different distorted versions of the same sample. Especially, in our learning algorithm based on SD, the single network utilizes the maximum mean discrepancy metric to learn the global feature consistency and the Kullback-Leibler divergence to constrain the posterior class probability consistency across the different distorted branches. Extensive experiments on MNIST, CIFAR-10/100, and ImageNet data sets demonstrate that the proposed method can effectively reduce the generalization error for various network architectures, such as AlexNet, VGGNet, ResNet, Wide ResNet, and DenseNet, and outperform existing model distillation methods with little extra training efforts.

Deep Neural Network Self-Distillation Exploiting Data Representation Invariance

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Deep Neural Network Self-Distillation Exploiting Data Representation Invariance

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper