☆ 4.7 Article

Low-degree term first in ResNet, its variants and the whole neural network family

NEURAL NETWORKS (2022)

期刊

NEURAL NETWORKS

卷 148, 期 -, 页码 155-165

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.neunet.2022.01.012

关键词

ResNets; DenseNets; Shallow subnetwork first; Low-degree term first; Taylor expansion

类别

Computer Science, Artificial Intelligence Neurosciences

资金

National Natural Science Foundation of China [61976216, 61672522]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a novel argument of shallow subnetwork first (SSF) which explains the working mechanism of ResNet and its variants. The experiments show that shallow subnetworks are trained firstly and play important roles in the neural networks' performance. It also reveals the reason why DenseNets outperform ResNets is due to the shallower subnetworks playing vital roles.

To explain the working mechanism of ResNet and its variants, this paper proposes a novel argument of shallow subnetwork first (SSF), essentially low-degree term first (LDTF), which also applies to the whole neural network family. A neural network with shortcut connections behaves as an ensemble of a number of subnetworks of differing depths. Among the subnetworks, the shallow subnetworks are trained firstly, having great effects on the performance of the neural network. The shallow subnetworks roughly correspond to low-degree polynomials, while the deep subnetworks are opposite. Based on Taylor expansion, SSF is consistent with LDTF. ResNet is in line with Taylor expansion: shallow subnetworks are trained firstly to keep low-degree terms, avoiding overfitting; deep subnetworks try to maintain high-degree terms, ensuring high description capacity. Experiments on ResNets and DenseNets show that shallow subnetworks are trained firstly and play important roles in the training of the networks. The experiments also reveal the reason why DenseNets outperform ResNets: The subnetworks playing vital roles in the training of the former are shallower than those in the training of the latter. Furthermore, LDTF can also be used to explain the working mechanism of other ResNet variants (SE-ResNets and SK-ResNets), and the common phenomena occurring in many neural networks. (C)& nbsp;& nbsp;2022 Elsevier Ltd. All rights reserved.

Low-degree term first in ResNet, its variants and the whole neural network family

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Low-degree term first in ResNet, its variants and the whole neural network family

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文