4.7 Article

Low-degree term first in ResNet, its variants and the whole neural network family

期刊

NEURAL NETWORKS
卷 148, 期 -, 页码 155-165

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.neunet.2022.01.012

关键词

ResNets; DenseNets; Shallow subnetwork first; Low-degree term first; Taylor expansion

资金

  1. National Natural Science Foundation of China [61976216, 61672522]

向作者/读者索取更多资源

This paper proposes a novel argument of shallow subnetwork first (SSF) which explains the working mechanism of ResNet and its variants. The experiments show that shallow subnetworks are trained firstly and play important roles in the neural networks' performance. It also reveals the reason why DenseNets outperform ResNets is due to the shallower subnetworks playing vital roles.
To explain the working mechanism of ResNet and its variants, this paper proposes a novel argument of shallow subnetwork first (SSF), essentially low-degree term first (LDTF), which also applies to the whole neural network family. A neural network with shortcut connections behaves as an ensemble of a number of subnetworks of differing depths. Among the subnetworks, the shallow subnetworks are trained firstly, having great effects on the performance of the neural network. The shallow subnetworks roughly correspond to low-degree polynomials, while the deep subnetworks are opposite. Based on Taylor expansion, SSF is consistent with LDTF. ResNet is in line with Taylor expansion: shallow subnetworks are trained firstly to keep low-degree terms, avoiding overfitting; deep subnetworks try to maintain high-degree terms, ensuring high description capacity. Experiments on ResNets and DenseNets show that shallow subnetworks are trained firstly and play important roles in the training of the networks. The experiments also reveal the reason why DenseNets outperform ResNets: The subnetworks playing vital roles in the training of the former are shallower than those in the training of the latter. Furthermore, LDTF can also be used to explain the working mechanism of other ResNet variants (SE-ResNets and SK-ResNets), and the common phenomena occurring in many neural networks. (C)& nbsp;& nbsp;2022 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据