4.7 Article

Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2019.2952219

关键词

Convergence; Training; Stochastic processes; Optimization; Loss measurement; Nickel; Learning systems; Learning theory; nonconvex optimization; Polyak-Lojasiewicz condition; stochastic gradient descent (SGD)

资金

  1. National Key Research and Development Program of China [2017YFB1003102]
  2. National Natural Science Foundation of China [11571078, 11671307, 61672478, 61806091]
  3. Program for University Key Laboratory of Guangdong Province [2017KSYS008]
  4. Program for Guangdong Introducing Innovative and Entrepreneurial Teams [2017ZT07X386]
  5. Alexander von Humboldt Foundation

向作者/读者索取更多资源

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Holder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Mathematics, Applied

Convergence of online mirror descent

Yunwen Lei, Ding-Xuan Zhou

APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS (2020)

Article Computer Science, Information Systems

Data-Dependent Generalization Bounds for Multi-Class Classification

Yunwen Lei, Urun Dogan, Ding-Xuan Zhou, Marius Kloft

IEEE TRANSACTIONS ON INFORMATION THEORY (2019)

Article Mathematics, Applied

Analysis of Singular Value Thresholding Algorithm for Matrix Completion

Yunwen Lei, Ding-Xuan Zhou

JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS (2019)

Article Automation & Control Systems

Adaptive nonlinear observer-based sliding mode control of robotic manipulator for handling an unknown payload

Guiying Li, Shuyang Wang, Zhigang Yu

Summary: This article introduces a novel approach to control robotic manipulators with an unknown constant payload, utilizing a nonlinear disturbance observer to estimate external forces induced by the payload. An adaptive technique is used to design the observer gain, which is then integrated with sliding mode control to alleviate chattering. The effectiveness of the proposed methods is validated through simulation results.

PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART I-JOURNAL OF SYSTEMS AND CONTROL ENGINEERING (2021)

Article Mathematics, Applied

Differentially private SGD with non-smooth losses

Puyu Wang, Yunwen Lei, Yiming Ying, Hai Zhang

Summary: This paper investigates the privacy and generalization guarantees of differentially private stochastic gradient descent algorithms in stochastic convex optimization by relaxing traditional strict assumptions through the use of output and gradient perturbations associated with non-smooth convex losses.

APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS (2022)

Article Computer Science, Artificial Intelligence

Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives

Yunwen Lei, Ke Tang

Summary: This paper develops novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. It shows that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, shedding insights on implicit regularization. By also connecting the study to Rademacher chaos complexities, it slightly refines existing studies on the uniform convergence of gradients.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles

Zhanliang Huang, Yunwen Lei, Ata Kaban

Summary: This paper presents a framework that leverages unlabelled data to reduce noise requirement and improve predictive performance in differentially private decision forests. The framework includes a median splitting criterion for balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique for accurate computation of private splitting points.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV (2023)

Article Computer Science, Artificial Intelligence

Stage-Wise Magnitude-Based Pruning for Recurrent Neural Networks

Guiying Li, Peng Yang, Chao Qian, Richang Hong, Ke Tang

Summary: This article proposes a novel stage-wise pruning method for recurrent neural networks (RNN), which can effectively prune both feedforward and RNN layers. Experimental results show that the proposed method performs significantly better than commonly used RNN pruning methods.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Fine-grained Generalization Analysis of Vector-valued Learning

Liang Wu, Antoine Ledent, Yunwen Lei, Marius Kloft

Summary: This paper initiates the generalization analysis of regularized vector-valued learning algorithms by presenting bounds with a mild dependency on the output dimension and a fast rate on the sample size. The discussions relax existing assumptions on the restrictive constraint of hypothesis spaces, smoothness of loss functions, and low-noise conditions.

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Norm-Based Generalisation Bounds for Deep Multi-Class Convolutional Neural Networks

Antoine Ledent, Waleed Mustafa, Yunwen Lei, Marius Kloft

Summary: The study presents generalization error bounds for deep learning with two key improvements, including no explicit dependence on the number of classes and adapting Rademacher analysis of DNNs to incorporate weight sharing. The bounds scale based on the norms of the parameter matrices, rather than the number of parameters, and show that each convolutional filter contributes only once to the bound.

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE (2021)

Article Automation & Control Systems

Generalization Performance of Multi-pass Stochastic Gradient Descent with Convex Loss Functions

Yunwen Lei, Ting Hu, Ke Tang

Summary: This paper provides optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions, without the need for bounded subgradient or smoothness assumptions, and stated with high probability. This improvement is achieved through a refined estimate on the norm of SGD iterates based on martingale analysis and concentration inequalities on empirical processes.

JOURNAL OF MACHINE LEARNING RESEARCH (2021)

Proceedings Paper Automation & Control Systems

Quaternion-based robust sliding mode control for spacecraft attitude tracking

Zhigang Yu, Guiying Li

PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019) (2019)

Article Information Science & Library Science

Accelerate proposal generation in R-CNN methods for fast pedestrian extraction

Juncheng Wang, Guiying Li

ELECTRONIC LIBRARY (2019)

暂无数据