☆ 4.6 Article Proceedings Paper

Theoretical analysis of batch and on-line training for gradient descent learning in neural networks

NEUROCOMPUTING (2009)

Journal

NEUROCOMPUTING

Volume 73, Issue 1-3, Pages 151-159

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2009.05.017

Keywords

Neural networks; Gradient descent learning; Batch training; On-line training; Quadratic loss functions; Convergence

Categories

Computer Science, Artificial Intelligence

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In this study, we theoretically analyze two essential training schemes for gradient descent learning in neural networks: batch and on-line training. The convergence properties of the two schemes applied to quadratic loss functions are analytically investigated. We quantify the convergence of each training scheme to the optimal weight using the absolute value of the expected difference (Measure I) and the expected squared difference (Measure 2) between the optimal weight and the weight computed by the scheme. Although on-line training has several advantages over batch training with respect to the first measure, it does not converge to the optimal weight with respect to the second measure if the variance of the per-instance gradient remains constant. However, if the variance decays exponentially, then on-line training converges to the optimal weight with respect to Measure 2. Our analysis reveals the exact degrees to which the training set size, the variance of the per-instance gradient, and the learning rate affect the rate of convergence for each scheme. (C) 2009 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Automation & Control Systems

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

Arnulf Jentzen, Adrian Riekert

Summary: In this article, the authors prove that the risk of the considered GD process converges exponentially fast to zero with a positive probability when certain conditions are met. They establish the differentiability and rank conditions for the global minima set, and apply relevant methods to demonstrate the local convergence of the GD optimization method. The research results are significant for the theoretical foundation and optimization algorithms in the field of deep learning.

JOURNAL OF MACHINE LEARNING RESEARCH (2022)

Add to Collection

Article Automation & Control Systems

Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

David Holzmueller, Ingo Steinwart

Summary: This study proves that two-layer (Leaky)ReLU networks initialized by the widely used method proposed by He et al. (2015) and trained using gradient descent on a least-squares loss are not universally consistent. In certain cases, the network can only find a bad local minimum and essentially performs linear regression, even for non-linear target functions.

JOURNAL OF MACHINE LEARNING RESEARCH (2022)

Add to Collection

Article Engineering, Electrical & Electronic

On the Convergence of Hybrid Server-Clients Collaborative Training

Kun Yang, Shengbo Chen, Cong Shen

Summary: This paper analyzes the model convergence of a new hybrid learning architecture that utilizes the dataset and computation power of the parameter server (PS) for collaborative model training with clients. The architecture combines parallel SGD at clients and sequential SGD at PS, and has shown advantages in terms of accuracy and convergence speed over clients-only and server-only training.

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case

Shuai Zhang, Meng Wang, Jinjun Xiong, Sijia Liu, Pin-Yu Chen

Summary: This research analyzes the learning problem of one-hidden-layer nonoverlapping convolutional neural networks with ReLU activation function from the perspective of model estimation. The results show that the accelerated gradient descent algorithm can converge to the true parameters (up to the noise level) with a linear rate and faster than vanilla GD. The study also theoretically establishes the sample complexity of the required training samples.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

Add to Collection

Article Computer Science, Software Engineering

Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions

Vivak Patel

Summary: This work introduces two stopping criteria for SGD that can be applied to nonconvex functions, addresses the issue of heuristic stopping criteria for SGD methods, and proves the convergence of gradient function evaluated at SGD's iterates to zero for Bottou-Curtis-Nocedal functions.

MATHEMATICAL PROGRAMMING (2022)

Add to Collection

Article Computer Science, Information Systems

Adadb: Adaptive Diff-Batch Optimization Technique for Gradient Descent

Muhammad U. S. Khan, Muhammad Jawad, Samee U. Khan

Summary: Gradient descent is widely used in deep neural networks, but it suffers from slow convergence, which can be improved by methods like momentum, Adam, diffGrad, and AdaBelief. This paper introduces a new optimization technique called adadb, which addresses issues in existing methods and increases the convergence rate.

IEEE ACCESS (2021)

Add to Collection

Article Operations Research & Management Science

Training neural networks from an ergodic perspective

W. Jung, C. A. Morales

Summary: In this research, neural network weights are interpreted as points in a metric space, with the training process viewed as an iterated function system on this space. The study found that starting with initial weights close to the minimum error yields the most effective training method, and provided an ergodic characterization of this efficient training. The findings suggest potential for further optimization advancements through numerical experimentation and the study of dynamical systems theory.

OPTIMIZATION (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Batch Gradient Training Method with Smoothing Group L0 Regularization for Feedfoward Neural Networks

Ying Zhang, Jianing Wei, Dongpo Xu, Huisheng Zhang

Summary: In this paper, a batch gradient training method with smoothing Group L-0 regularization (BGSGL(0)) is proposed for pruning neural networks. BGSGL(0) overcomes the NP-hard nature of L-0 regularization and prunes the network from the neuron level.

NEURAL PROCESSING LETTERS (2023)

Add to Collection

Article Mathematics, Applied

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

Arnulf Jentzen, Adrian Riekert

Summary: This article studies the stochastic gradient descent (SGD) optimization method in training fully connected feedforward artificial neural networks with ReLU activation. The main result of this work shows that the risk of the SGD process converges to zero if the target function is constant. The considered artificial neural networks consist of one input layer, one hidden layer, and one output layer.

ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK (2022)

Add to Collection

Article Mathematics, Applied

Adaptive three-term PRP algorithms without gradient Lipschitz continuity condition for nonconvex functions

Gonglin Yuan, Heshu Yang, Mengxiang Zhang

Summary: This paper introduces two conjugate gradient methods for solving the gradient non-Lipschitz continuity problem, and points out that these methods perform competitively in numerical experiments. Algorithm 1, combined with the MPRP algorithm and independent of line search technology, achieves global convergence; Algorithm 2 further improves the performance based on Algorithm 1, achieving global convergence independently of line search technique.

NUMERICAL ALGORITHMS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Automatic, dynamic, and nearly optimal learning rate specification via local quadratic approximation

Yingqiu Zhu, Danyang Huang, Yuan Gao, Rui Wu, Yu Chen, Bo Zhang, Hansheng Wang

Summary: The proposed optimization method based on local quadratic approximation dynamically adjusts the learning rate and achieves nearly maximum reduction in the loss function, demonstrating automatic learning rate determination and computational efficiency.

NEURAL NETWORKS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Convergence of Batch Gradient Method for Training of Pi-Sigma Neural Network with Regularizer and Adaptive Momentum Term

Qinwei Fan, Le Liu, Qian Kang, Li Zhou

Summary: This paper introduces the characteristics and challenges of Pi-sigma neural network (PSNN) and proposes an improved sparse-response feed-forward algorithm. The algorithm uses an adaptive momentum term and a group lasso regularizer, which enables fast convergence and obtains sparse and efficient neural networks.

NEURAL PROCESSING LETTERS (2023)

Add to Collection

Article Computer Science, Software Engineering

Gradient descent for quadratic functions using geometric mean and the Kai Fang method

KwangCheol Rim, Pankoo Kim, Hoon Ko

Summary: This study transforms the gradient calculation from conventional quadratic gradient descent algorithms into a root extraction calculation using geometric means. By introducing the Kai Fang method, an improved quadratic gradient descent method is proposed.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2023)

Add to Collection

Article Statistics & Probability

Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator

Haobo Qi, Feifei Wang, Hansheng Wang

Summary: This study presents a fixed mini-batch gradient descent (FMGD) algorithm for optimizing problems with massive datasets. FMGD divides the sample into non-overlapping partitions and keeps them fixed throughout the algorithm. By calculating the gradients on each fixed mini-batch sequentially, the computation cost for each iteration is significantly reduced. This makes FMGD computationally efficient and practically feasible.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2023)

Add to Collection

Article Mathematics

An Improved Modification of Accelerated Double Direction and Double Step-Size Optimization Schemes

Milena J. Petrovic, Dragana Valjarevic, Dejan Ilic, Aleksandar Valjarevic, Julija Mladenovic

Summary: The proposed improved variant of the accelerated gradient optimization models merges the positive features of different models to define a simpler and more effective iterative method. Convergence analysis shows that the method is at least linearly convergent for uniformly convex and strictly convex functions. Numerical test results confirm the efficiency of the developed model in terms of CPU time, the number of iterations, and function evaluations.

MATHEMATICS (2022)

Add to Collection

No Data Available

Article Computer Science, Artificial Intelligence

3D-KCPNet: Efficient 3DCNNs based on tensor mapping theory

Rui Lv, Dingheng Wang, Jiangbin Zheng, Zhao-Xu Yang

Summary: In this paper, the authors investigate tensor decomposition for neural network compression. They analyze the convergence and precision of tensor mapping theory, validate the rationality of tensor mapping and its superiority over traditional tensor approximation based on the Lottery Ticket Hypothesis. They propose an efficient method called 3D-KCPNet to compress 3D convolutional neural networks using the Kronecker canonical polyadic (KCP) tensor decomposition. Experimental results show that 3D-KCPNet achieves higher accuracy compared to the original baseline model and the corresponding tensor approximation model.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Personalized robotic control via constrained multi-objective reinforcement learning

Xiangkun He, Zhongxu Hu, Haohan Yang, Chen Lv

Summary: In this paper, a novel constrained multi-objective reinforcement learning algorithm is proposed for personalized end-to-end robotic control with continuous actions. The approach trains a single model using constraint design and a comprehensive index to achieve optimal policies based on user-specified preferences.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Overlapping community detection using expansion with contraction

Zhijian Zhuo, Bilian Chen, Shenbao Yu, Langcai Cao

Summary: In this paper, a novel method called Expansion with Contraction Method for Overlapping Community Detection (ECOCD) is proposed, which utilizes non-negative matrix factorization to obtain disjoint communities and applies expansion and contraction processes to adjust the degree of overlap. ECOCD is applicable to various networks with different properties and achieves high-quality overlapping community detection.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

High-compressed deepfake video detection with contrastive spatiotemporal distillation

Yizhe Zhu, Chunhui Zhang, Jialin Gao, Xin Sun, Zihan Rui, Xi Zhou

Summary: In this work, the authors propose a Contrastive Spatio-Temporal Distilling (CSTD) approach to improve the detection of high-compressed deepfake videos. The approach leverages spatial-frequency cues and temporal-contrastive alignment to fully exploit spatiotemporal inconsistency information.

NEUROCOMPUTING (2024)

Add to Collection

Review Computer Science, Artificial Intelligence

A review of coverless steganography

Laijin Meng, Xinghao Jiang, Tanfeng Sun

Summary: This paper provides a review of coverless steganographic algorithms, including the development process, known contributions, and general issues in image and video algorithms. It also discusses the security of coverless steganography from theoretical analysis to actual investigation for the first time.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Confidence-based interactable neural-symbolic visual question answering

Yajie Bao, Tianwei Xing, Xun Chen

Summary: Visual question answering requires processing multi-modal information and effective reasoning. Neural-symbolic learning is a promising method, but current approaches lack uncertainty handling and can only provide a single answer. To address this, we propose a confidence based neural-symbolic approach that evaluates NN inferences and conducts reasoning based on confidence.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

A framework-based transformer and knowledge distillation for interior style classification

Anh H. Vo, Bao T. Nguyen

Summary: Interior style classification is an interesting problem with potential applications in both commercial and academic domains. This project proposes a method named ISC-DeIT, which combines data-efficient image transformer architectures and knowledge distillation, to address the interior style classification problem. Experimental results demonstrate a significant improvement in predictive accuracy compared to other state-of-the-art methods.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Improving robustness for vision transformer with a simple dynamic scanning augmentation

Shashank Kotyan, Danilo Vasconcellos Vargas

Summary: This article introduces a novel augmentation technique called Dynamic Scanning Augmentation to improve the accuracy and robustness of Vision Transformer (ViT). The technique leverages dynamic input sequences to adaptively focus on different patches, resulting in significant changes in ViT's attention mechanism. Experimental results demonstrate that Dynamic Scanning Augmentation outperforms ViT in terms of both robustness to adversarial attacks and accuracy against natural images.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Introducing shape priors in Siamese networks for image classification

Hiba Alqasir, Damien Muselet, Christophe Ducottet

Summary: The article proposes a solution to improve the learning process of a classification network by providing shape priors, reducing the need for annotated data. The solution is tested on cross-domain digit classification tasks and a video surveillance application.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Neural dynamics solver for time-dependent infinity-norm optimization based on ACP framework with robot application

Dexiu Ma, Mei Liu, Mingsheng Shang

Summary: This paper proposes a method using neural dynamics solvers to solve infinity-norm optimization problems. Two improved solvers are constructed and their effectiveness and superiority are demonstrated through theoretical analysis and simulation experiments.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

cpp-AIF: A multi-core C plus plus implementation of Active Inference for Partially Observable Markov Decision Processes

Francesco Gregoretti, Giovanni Pezzulo, Domenico Maisto

Summary: Active Inference is a computational framework that uses probabilistic inference and variational free energy minimization to describe perception, planning, and action. cpp-AIF is a header-only C++ library that provides a powerful tool for implementing Active Inference for Partially Observable Markov Decision Processes through multi-core computing. It is cross-platform and improves performance, memory management, and usability compared to existing software.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Predicting stock market trends with self-supervised learning

Zelin Ying, Dawei Cheng, Cen Chen, Xiang Li, Peng Zhu, Yifeng Luo, Yuqi Liang

Summary: This paper proposes a novel stock market trends prediction framework called SMART, which includes a self-supervised stock technical data sequence embedding model S3E. By training with multiple self-supervised auxiliary tasks, the model encodes stock technical data sequences into embeddings and uses the learned sequence embeddings for predicting stock market trends. Extensive experiments on China A-Shares market and NASDAQ market prove the high effectiveness of our model in stock market trends prediction, and its effectiveness is further validated in real-world applications in a leading financial service provider in China.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

DHGAT: Hyperbolic representation learning on dynamic graphs via attention networks

Hao Li, Hao Jiang, Dongsheng Ye, Qiang Wang, Liang Du, Yuanyuan Zeng, Liu Yuan, Yingxue Wang, C. Chen

Summary: DHGAT1, a dynamic hyperbolic graph attention network, utilizes hyperbolic metric properties to embed dynamic graphs. It employs a spatiotemporal self-attention mechanism and weighted node representations, resulting in excellent performance in link prediction tasks.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Progressive network based on detail scaling and texture extraction: A more general framework for image deraining

Jiehui Huang, Zhenchao Tang, Xuedong He, Jun Zhou, Defeng Zhou, Calvin Yu-Chian Chen

Summary: This study proposes a progressive learning multi-scale feature blending model for image deraining tasks. The model utilizes detail dilation and texture extraction to improve the restoration of rainy images. Experimental results show that the model achieves near state-of-the-art performance in rain removal tasks and exhibits better rain removal realism.

NEUROCOMPUTING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Stabilization and synchronization control for discrete-time complex networks via the auxiliary role of edges subsystem

Lizhi Liu, Zilin Gao, Yinhe Wang, Yongfu Li

Summary: This paper proposes a novel discrete-time interconnected model for depicting complex dynamical networks. The model consists of nodes and edges subsystems, which consider the dynamic characteristic of both nodes and edges. By designing control strategies and coupling modes, the stabilization and synchronization of the network are achieved. Simulation results demonstrate the effectiveness of the proposed methods.

NEUROCOMPUTING (2024)

Add to Collection

© Peeref 2019-2024. All rights reserved.