Article
Computer Science, Artificial Intelligence
Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu
Summary: Integrating adaptive learning rate and momentum techniques into stochastic gradient descent has led to various efficient adaptive stochastic algorithms such as AdaGrad, RMSProp, Adam, and AccAdaGrad. This paper proposes a weighted AdaGrad algorithm called AdaUSM that incorporates a unified momentum scheme and a novel weighted adaptive learning rate. It shows that AdaUSM achieves $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the nonconvex stochastic setting with polynomially growing weights. Furthermore, it provides a new perspective for understanding Adam and RMSProp by showing that their adaptive learning rates correspond to exponentially growing weights in AdaUSM. Comparative experiments on deep learning models and datasets are also conducted.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Timothy Castiglia, Shiqiang Wang, Stacy Patterson
Summary: This paper presents a flexible vertical federated learning algorithm (Flex-VFL) that trains nonconvex functions in a distributed system with vertically partitioned data. The algorithm utilizes a parallel block coordinate descent technique and considers a heterogeneous system with different parties. The convergence rate of Flex-VFL is constrained by the parties' speeds and local optimizer parameters, and the algorithm can adaptively adjust the learning rates based on changes in speed and parameters.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Automation & Control Systems
Shi Pu, Alex Olshevsky, Ioannis Ch Paschalidis
Summary: This article focuses on minimizing the average of cost functions over a network, where agents can communicate and exchange information. The article studies the distributed stochastic gradient descent (DSGD) method when only noisy gradient information is available, and performs a nonasymptotic convergence analysis. The main contribution of the article is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, and construct a hard optimization problem to prove the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
(2022)
Article
Operations Research & Management Science
Vassilis Apidopoulos, Nicolo Ginatta, Silvia Villa
Summary: This study focuses on the convergence of trajectories in the framework of convex and non-convex smooth optimization, specifically the Heavy Ball dynamical system with constant damping coefficient. By applying the Polyak-Lojasiewicz condition, new linear convergence rates are derived for the associated trajectory, based on objective function values, without assuming the uniqueness of the minimizer.
JOURNAL OF GLOBAL OPTIMIZATION
(2022)
Article
Computer Science, Artificial Intelligence
Samer Saab, Shashi Phoha, Minghui Zhu, Asok Ray
Summary: The paper introduces the application of the heavy-ball method in large-scale machine learning problems and proposes an adaptive approach to estimate the optimal hyper-parameters. The effectiveness of the method is demonstrated through experiments and comparisons with other optimizers.
Article
Computer Science, Artificial Intelligence
Navid Azizan, Sahin Lale, Babak Hassibi
Summary: Research has shown that in overparameterized scenarios, stochastic mirror descent (SMD) algorithms converge to the closest global minimum to the initialization point, achieving implicit regularization. Experimental results indicate a significant difference in generalization performance of different SMD algorithms, with l(infinity) norm consistently outperforming SGD and l(1) norm.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2022)
Article
Automation & Control Systems
Xinlei Yi, Shengjun Zhang, Tao Yang, Tianyou Chai, Karl Henrik Johansson
Summary: This paper proposes a distributed primal-dual stochastic gradient descent algorithm for the distributed nonconvex optimization problem. The algorithm achieves linear speedup convergence rate for general nonconvex cost functions and the output linearly converges to a neighborhood of a global optimum. Numerical experiments demonstrate the efficiency of the algorithm compared to baseline centralized SGD and recently proposed distributed SGD algorithms.
IEEE-CAA JOURNAL OF AUTOMATICA SINICA
(2022)
Article
Computer Science, Artificial Intelligence
Weiyu Li, Zhaoxian Wu, Tianyi Chen, Liping Li, Qing Ling
Summary: This article proposes a communication-efficient algorithm to reduce burdensome communication in distributed machine learning by introducing a communication-censoring technique to reduce variable transmissions. The algorithm achieves the same convergence rate as SGD while effectively reducing communication. Numerical experiments show significant communication savings.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2022)
Article
Mathematics, Applied
Jinda Yang, Haiming Song, Xinxin Li, Di Hou
Summary: This paper develops a block mirror stochastic gradient method for solving stochastic optimization problems involving both convex and nonconvex cases, treating the feasible set and variables as multiple blocks. The proposed method combines the features of classic mirror descent stochastic method and block coordinate gradient descent method. By acquiring stochastic gradient information through stochastic oracles, the method updates all variable blocks in a Gauss-Seidel type. Convergence is established for both convex and nonconvex cases, although the analysis of the method is challenging due to the failure of the typical unbiasedness assumption of stochastic gradients in the Gauss-Seidel renewal type, requiring more specific assumptions. The efficiency of the proposed algorithm is demonstrated through testing on the conditional value-at-risk problem and the stochastic LASSO problem.
JOURNAL OF SCIENTIFIC COMPUTING
(2023)
Article
Automation & Control Systems
Brian Swenson, Ryan Murray, H. Vincent Poor, Soummya Kar
Summary: This paper investigates the convergence and avoidance of saddle points of distributed stochastic gradient descent algorithms in nonconvex and nonsmooth scenarios.
JOURNAL OF MACHINE LEARNING RESEARCH
(2022)
Article
Computer Science, Software Engineering
Vivak Patel
Summary: This work introduces two stopping criteria for SGD that can be applied to nonconvex functions, addresses the issue of heuristic stopping criteria for SGD methods, and proves the convergence of gradient function evaluated at SGD's iterates to zero for Bottou-Curtis-Nocedal functions.
MATHEMATICAL PROGRAMMING
(2022)
Article
Computer Science, Artificial Intelligence
Derek Driggs, Junqi Tang, Jingwei Liang, Mike Davies, Carola-Bibiane Schonlieb
Summary: In this work, a novel stochastic proximal alternating linearized minimization algorithm was introduced to solve a class of nonsmooth and nonconvex optimization problems. With variance-reduced stochastic gradient estimators, the proposed method achieved state-of-the-art oracle complexities and demonstrated efficacy in various numerical examples.
SIAM JOURNAL ON IMAGING SCIENCES
(2021)
Article
Computer Science, Artificial Intelligence
XianFu Cheng, YanQing Yao, Liying Zhang, Ao Liu, Zhoujun Li
Summary: Deep learning techniques based on neural networks have achieved significant results, but there are privacy risks in model training. To address this issue, we propose an improved differential privacy stochastic gradient descent algorithm, optimizing the allocation method of privacy loss. Our experiments show that our method can improve model quality while protecting privacy.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Xinyu Peng, Fei-Yue Wang, Li Li
Summary: This article studies a new data selection method called the "nontypicality sampling scheme" to improve the generalization performance of deep neural networks. The scheme biases the solution towards wider minima, resulting in better performance. Experimental results confirm the effectiveness of the scheme and suggest its potential benefits for various variants of minibatch stochastic gradient descent.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Automation & Control Systems
Xinlei Yi, Shengjun Zhang, Tao Yang, Karl H. Johansson
Summary: In this paper, we propose two distributed zeroth-order optimization algorithms for solving a stochastic distributed nonconvex optimization problem with limited information. The algorithms achieve linear speedup convergence rate for smooth cost functions and have the ability to systematically improve processing performance by adding more agents. This is the first linear speedup result for distributed zeroth-order optimization algorithms.
Article
Mathematics, Applied
Yunwen Lei, Ding-Xuan Zhou
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS
(2020)
Article
Computer Science, Information Systems
Yunwen Lei, Urun Dogan, Ding-Xuan Zhou, Marius Kloft
IEEE TRANSACTIONS ON INFORMATION THEORY
(2019)
Article
Mathematics, Applied
Yunwen Lei, Ding-Xuan Zhou
JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS
(2019)
Article
Automation & Control Systems
Guiying Li, Shuyang Wang, Zhigang Yu
Summary: This article introduces a novel approach to control robotic manipulators with an unknown constant payload, utilizing a nonlinear disturbance observer to estimate external forces induced by the payload. An adaptive technique is used to design the observer gain, which is then integrated with sliding mode control to alleviate chattering. The effectiveness of the proposed methods is validated through simulation results.
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART I-JOURNAL OF SYSTEMS AND CONTROL ENGINEERING
(2021)
Article
Mathematics, Applied
Puyu Wang, Yunwen Lei, Yiming Ying, Hai Zhang
Summary: This paper investigates the privacy and generalization guarantees of differentially private stochastic gradient descent algorithms in stochastic convex optimization by relaxing traditional strict assumptions through the use of output and gradient perturbations associated with non-smooth convex losses.
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS
(2022)
Article
Computer Science, Artificial Intelligence
Yunwen Lei, Ke Tang
Summary: This paper develops novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. It shows that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, shedding insights on implicit regularization. By also connecting the study to Rademacher chaos complexities, it slightly refines existing studies on the uniform convergence of gradients.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Zhanliang Huang, Yunwen Lei, Ata Kaban
Summary: This paper presents a framework that leverages unlabelled data to reduce noise requirement and improve predictive performance in differentially private decision forests. The framework includes a median splitting criterion for balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique for accurate computation of private splitting points.
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV
(2023)
Article
Computer Science, Artificial Intelligence
Guiying Li, Peng Yang, Chao Qian, Richang Hong, Ke Tang
Summary: This article proposes a novel stage-wise pruning method for recurrent neural networks (RNN), which can effectively prune both feedforward and RNN layers. Experimental results show that the proposed method performs significantly better than commonly used RNN pruning methods.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Liang Wu, Antoine Ledent, Yunwen Lei, Marius Kloft
Summary: This paper initiates the generalization analysis of regularized vector-valued learning algorithms by presenting bounds with a mild dependency on the output dimension and a fast rate on the sample size. The discussions relax existing assumptions on the restrictive constraint of hypothesis spaces, smoothness of loss functions, and low-noise conditions.
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Antoine Ledent, Waleed Mustafa, Yunwen Lei, Marius Kloft
Summary: The study presents generalization error bounds for deep learning with two key improvements, including no explicit dependence on the number of classes and adapting Rademacher analysis of DNNs to incorporate weight sharing. The bounds scale based on the norms of the parameter matrices, rather than the number of parameters, and show that each convolutional filter contributes only once to the bound.
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE
(2021)
Article
Automation & Control Systems
Yunwen Lei, Ting Hu, Ke Tang
Summary: This paper provides optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions, without the need for bounded subgradient or smoothness assumptions, and stated with high probability. This improvement is achieved through a refined estimate on the norm of SGD iterates based on martingale analysis and concentration inequalities on empirical processes.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Proceedings Paper
Automation & Control Systems
Zhigang Yu, Guiying Li
PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019)
(2019)
Article
Information Science & Library Science
Juncheng Wang, Guiying Li
ELECTRONIC LIBRARY
(2019)