Review
Mathematics
Yingjie Tian, Yuqi Zhang, Haibin Zhang
Summary: In the age of artificial intelligence, finding the best approach to handle massive data is a challenging task. Stochastic gradient descent (SGD) stands out among machine learning models as it is simple yet highly effective. This study examines various contemporary deep learning applications, including natural language processing (NLP), visual data processing, and voice and audio processing. The study also presents different versions of SGD and its variant available in the PyTorch optimizer, such as SGD, Adagrad, adadelta, RMSprop, Adam, AdamW, etc. Additionally, theoretical conditions for the applicability of these methods are proposed, highlighting the existing gap between theoretical convergence and practical implementation.
Article
Computer Science, Artificial Intelligence
Anuraganand Sharma
Summary: The proposed guided SGD algorithm compensates for the deviation caused by delay and encourages consistent examples to steer the convergence of SGD, reducing the impact of delay on neural network models.
APPLIED SOFT COMPUTING
(2021)
Article
Automation & Control Systems
Yi-Rui Yang, Wu-Jun Li
Summary: Distributed learning is widely used in various fields such as cluster-based large-scale learning, federated learning, and edge computing. Byzantine learning, which deals with failure or attack in distributed learning, has gained attention recently. This paper proposes a novel method called buffered asynchronous stochastic gradient descent (BASGD) for asynchronous Byzantine learning (ABL). BASGD is the first ABL method that can resist non-omniscient attacks without storing any instances on the server. An improved variant of BASGD, called BASGD with momentum (BASGDm), is also introduced. Both BASGD and BASGDm have a wider scope of application and are proven to be convergent and able to resist failure or attack. Empirical results demonstrate the superior performance of our methods compared to existing ABL baselines in the presence of failure or attack.
JOURNAL OF MACHINE LEARNING RESEARCH
(2023)
Article
Computer Science, Theory & Methods
Hao Zhang, Tingting Wu, Zhifeng Ma, Feng Li, Jie Liu
Summary: Distributed stochastic gradient descent (SGD) algorithms are commonly used to speed up deep learning model training by employing multiple computational devices in parallel. Top-k sparsification is an effective method to reduce communication overhead, but traditional implementations have limitations in training efficiency and model performance. This paper introduces a Dynamic Layer-wise Sparsification (DLS) mechanism and its extensions, DLS(s), which balances efficiency and performance by adjusting sparsity ratios of layers. Experimental results show that DLS(s) outperforms existing methods in terms of both performance and training time.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
(2023)
Article
Computer Science, Artificial Intelligence
Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, Rio Yokota
Summary: This paper proposes a scalable and practical natural gradient descent (SP-NGD) method for large-scale distributed training of deep neural networks. It achieves similar generalization performance to models trained with first-order optimization methods, but with accelerated convergence.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Anirban Das, Timothy Castiglia, Shiqiang Wang, Stacy Patterson
Summary: This study applies federated learning to tiered communication networks and proposes a communication-efficient decentralized training algorithm for two-tiered networks. The algorithm is validated through theoretical analysis and empirical experiments.
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
(2022)
Article
Computer Science, Artificial Intelligence
Alexandre Lemire Paquin, Brahim Chaib-draa, Philippe Giguere
Summary: We provide new generalization bounds for stochastic gradient descent in training classifiers with invariances. Our analysis covers both convex and non-convex cases and is based on the stability framework. We investigate angle-wise stability instead of euclidean stability in weights for training and consider an invariant distance measure for neural networks. Moreover, we utilize on-average stability to obtain a data-dependent quantity in the bound, which proves to be more favorable with larger learning rates in our experiments.
Article
Computer Science, Theory & Methods
Jungang Yang, Liyao Xiang, Ruidong Chen, Weiting Li, Baochun Li
Summary: This study introduces a new (ε, δ)-differential privacy mechanism TVG for protecting privacy in tensor-valued queries, with improved utility by applying unimodal differentially-private noise. Experimental results demonstrate that TVG outperforms other state-of-the-art mechanisms in tensor-valued queries.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
(2022)
Article
Computer Science, Artificial Intelligence
Kensuke Nakamura, Stefano Soatto, Byung-Woo Hong
Summary: BCSC is a stochastic first-order optimization algorithm that adds a cyclic constraint to the selection of data and parameters, resulting in higher accuracy in image classification. It effectively limits the impact of outliers in the training set and provides better generalization performance within the same number of update iterations.
Article
Computer Science, Artificial Intelligence
Guozhang Chen, Cheng Kevin Qu, Pulin Gong
Summary: This study reveals the effectiveness of stochastic gradient descent (SGD) in deep learning by investigating its interactions with the geometrical structure of the loss landscape. The study finds that SGD exhibits rich, complex dynamics with superdiffusion in the initial learning phase and subdiffusion at long times. These learning dynamics are observed in different types of deep neural networks and are independent of batch size and learning rate settings. The superdiffusion process is attributed to the interactions between SGD and fractal-like regions of the loss landscape.
Article
Automation & Control Systems
Chris Mingard, Guillermo Valle-Perez, Joar Skalse, Ard A. Louis
Summary: The study found that deep neural networks exhibit strong inductive bias in the overparameterised regime, primarily due to the characteristics of the parameter-function map. The Bayesian posterior probability is a key factor influencing DNN's generalisation ability, closely related to the performance of stochastic gradient descent.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Computer Science, Artificial Intelligence
Yawen Li, Wenling Li, Zhe Xue
Summary: This paper investigates the distributed federated learning problem with quantized exchanged information. A novel quantized federated averaging algorithm is proposed and analyzed for both convex and strongly convex loss functions. Extensive experiments using realistic data are conducted to validate the effectiveness of the algorithm.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Pengzhan Guo, Zeyang Ye, Keli Xiao, Wei Zhu
Summary: This paper investigates stochastic optimization in deep learning and proposes a scalable parallel algorithm. The algorithm improves the objective function in neural network models and introduces a new parallel computing strategy for accelerating the training process. Experimental results demonstrate the significant advantages of the algorithm in accelerating deep architecture training.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2022)
Article
Automation & Control Systems
Duk-Sun Shim, Joseph Shim
Summary: This paper proposes a modified stochastic gradient descent (mSGD) algorithm that uses a random learning rate to reduce the time required for determining the learning rate. The experiment shows that mSGD algorithm has better convergence performance than SGD algorithm and slightly better than AdaGrad and Adam algorithms.
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS
(2023)
Article
Mathematics
Bodo Herzog
Summary: The aim of this article is to establish a stochastic search algorithm for neural networks based on fractional stochastic processes. Fractional stochastic processes, {B-t(H), t = 0}, which generalize a standard Brownian motion, capture different properties in order to simulate real-world phenomena. This approach provides new insights to stochastic gradient descent (SGD) algorithms in machine learning, and convergence properties for fractional stochastic processes are exhibited.
Article
Computer Science, Hardware & Architecture
Saad Aljubayrin, Jianzhong Qi, Christian S. Jensen, Rui Zhang, Zhen He, Yuan Li
Article
Computer Science, Information Systems
Shengxun Yang, Zhen He, Yi-Ping Phoebe Chen
INFORMATION SCIENCES
(2018)
Article
Substance Abuse
Emmanuel Kuntsche, Abraham Albert Bonela, Gabriel Caluzzi, Mia Miller, Zhen He
DRUG AND ALCOHOL DEPENDENCE
(2020)
Article
Computer Science, Theory & Methods
Matthias Langer, Zhen He, Wenny Rahayu, Yanbo Xue
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2020)
Article
Computer Science, Artificial Intelligence
Ashley Hall, Brandon Victor, Zhen He, Matthias Langer, Marc Elipot, Aiden Nibali, Stuart Morgan
Summary: It is crucial for swimming coaches to analyze swimmers' performance for strategy adjustment, relying on statistics derived from time-consuming manual video annotations. A two-phased deep learning approach called DeepDASH and a hierarchical tracking algorithm called HISORT are proposed to solve computer vision tasks in swimming videos, achieving significant improvements in swimmer head detection, tracking, and stroke detection.
NEURAL COMPUTING & APPLICATIONS
(2021)
Article
Biochemical Research Methods
Robert T. Furbank, Viridiana Silva-Perez, John R. Evans, Anthony G. Condon, Gonzalo M. Estavillo, Wennan He, Saul Newman, Richard Poire, Ashley Hall, Zhen He
Summary: The study demonstrates that the accuracy of predicting wheat photosynthetic and leaf traits using deep learning and ensemble models can be improved compared to PLSR without overfitting. These models can be flexibly applied across different spectral ranges without compromising accuracy.
Article
Biology
Min Luo, Zhen He, Hui Cui, Yi-Ping Phoebe Chen, Phillip Ward
Summary: We propose a novel attention transfer method for accurately predicting the progression of Alzheimer's disease (AD) in patients with mild cognitive impairment (MCI). Our method trains a 3D convolutional neural network to automatically learn regions of interest (ROI) from images and transfer attention maps instead of model weights. Our method outperformed traditional transfer learning and methods using expert knowledge to define ROI, and the attention map revealed Alzheimer's pathology.
COMPUTERS IN BIOLOGY AND MEDICINE
(2023)
Proceedings Paper
Engineering, Electrical & Electronic
Aiden Nibali, Zhen He, Stuart Morgan, Luke Prendergast
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2019)
Proceedings Paper
Computer Science, Artificial Intelligence
Aiden Nibali, Zhen He, Stuart Morgan, Daniel Greenwood
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)
(2017)