☆ 4.6 Article

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

IEEE TRANSACTIONS ON CYBERNETICS (2022)

期刊

IEEE TRANSACTIONS ON CYBERNETICS

卷 52, 期 4, 页码 2070-2081

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCYB.2020.3007506

关键词

Training; Knowledge engineering; Neural networks; Collaboration; Computational modeling; Task analysis; Computer vision; deep learning; knowledge distillation (KD); neural-networks compression

类别

Automation & Control Systems Computer Science, Artificial Intelligence Computer Science, Cybernetics

资金

National Natural Science Foundation of China [U1706218, 61971388]
Major Program of Natural Science Foundation of Shandong Province [ZR2018ZB0852]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces a new collaborative teaching KD strategy that utilizes two special teachers. One teacher is trained from scratch to assist the student step by step, while the other pretrained teacher guides the student to focus on critical regions. The combination of knowledge from these two special teachers significantly improves the performance of the student network in KD.

High storage and computational costs obstruct deep neural networks to be deployed on resource-constrained devices. Knowledge distillation (KD) aims to train a compact student network by transferring knowledge from a larger pretrained teacher model. However, most existing methods on KD ignore the valuable information among the training process associated with training results. In this article, we provide a new collaborative teaching KD (CTKD) strategy which employs two special teachers. Specifically, one teacher trained from scratch (i.e., scratch teacher) assists the student step by step using its temporary outputs. It forces the student to approach the optimal path toward the final logits with high accuracy. The other pretrained teacher (i.e., expert teacher) guides the student to focus on a critical region that is more useful for the task. The combination of the knowledge from two special teachers can significantly improve the performance of the student network in KD. The results of experiments on CIFAR-10, CIFAR-100, SVHN, Tiny ImageNet, and ImageNet datasets verify that the proposed KD method is efficient and achieves state-of-the-art performance.

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文