Article
Computer Science, Artificial Intelligence
Trung Le, Khanh Nguyen, Dinh Phung
Summary: In this paper, we propose a Stochastic Variance-reduced Gradient Descent algorithm for Kernel Online Learning (DualSVRG) that achieves the epsilon-approximate linear convergence rate and optimal model performance. By introducing the concept of an instant memory and transformer oracles, our approach reduces the variance in estimating the full gradient and encourages fast training while maintaining model sparsity.
Article
Statistics & Probability
Tianyu Zhang, Noah Simon
Summary: The goal of regression is to recover an unknown underlying function from noisy observations. In this study, we propose a sieve stochastic gradient descent estimator for online nonparametric regression, which shows advantages in computational efficiency and memory usage.
ANNALS OF STATISTICS
(2022)
Article
Computer Science, Artificial Intelligence
Anuraganand Sharma
Summary: The proposed guided SGD algorithm compensates for the deviation caused by delay and encourages consistent examples to steer the convergence of SGD, reducing the impact of delay on neural network models.
APPLIED SOFT COMPUTING
(2021)
Article
Computer Science, Artificial Intelligence
Anirban Das, Timothy Castiglia, Shiqiang Wang, Stacy Patterson
Summary: This study applies federated learning to tiered communication networks and proposes a communication-efficient decentralized training algorithm for two-tiered networks. The algorithm is validated through theoretical analysis and empirical experiments.
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
(2022)
Article
Engineering, Electrical & Electronic
Srujan Teja Thomdapu, Harsh Vardhan, Ketan Rajawat
Summary: This work studies constrained stochastic optimization problems in the context of fair classification, fair regression, and queuing system design. The goal is to solve these problems in a large-scale setting with a minimal number of calls to the oracle. A quasi-gradient saddle point algorithm is proposed to construct approximate gradients and find the optimal and feasible solution almost surely. The proposed algorithm requires O(1/epsilon(4)) data samples to obtain an epsilon-approximate optimal point with zero constraint violation, outperforming state-of-the-art algorithms in terms of convergence rate in fair classification and fair regression problems.
IEEE TRANSACTIONS ON SIGNAL PROCESSING
(2023)
Review
Mathematics
Yingjie Tian, Yuqi Zhang, Haibin Zhang
Summary: In the age of artificial intelligence, finding the best approach to handle massive data is a challenging task. Stochastic gradient descent (SGD) stands out among machine learning models as it is simple yet highly effective. This study examines various contemporary deep learning applications, including natural language processing (NLP), visual data processing, and voice and audio processing. The study also presents different versions of SGD and its variant available in the PyTorch optimizer, such as SGD, Adagrad, adadelta, RMSprop, Adam, AdamW, etc. Additionally, theoretical conditions for the applicability of these methods are proposed, highlighting the existing gap between theoretical convergence and practical implementation.
Article
Statistics & Probability
Haobo Qi, Feifei Wang, Hansheng Wang
Summary: This study presents a fixed mini-batch gradient descent (FMGD) algorithm for optimizing problems with massive datasets. FMGD divides the sample into non-overlapping partitions and keeps them fixed throughout the algorithm. By calculating the gradients on each fixed mini-batch sequentially, the computation cost for each iteration is significantly reduced. This makes FMGD computationally efficient and practically feasible.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2023)
Article
Automation & Control Systems
Duk-Sun Shim, Joseph Shim
Summary: This paper proposes a modified stochastic gradient descent (mSGD) algorithm that uses a random learning rate to reduce the time required for determining the learning rate. The experiment shows that mSGD algorithm has better convergence performance than SGD algorithm and slightly better than AdaGrad and Adam algorithms.
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS
(2023)
Article
Biology
Ruijian Han, Lan Luo, Yuanyuan Lin, Jian Huang
Summary: We propose a debiased stochastic gradient descent algorithm for online statistical inference with high-dimensional data, which can efficiently construct confidence intervals. It reduces time and space complexity and achieves nominal coverage probability.
Article
Automation & Control Systems
Weijie J. Su, Yuancheng Zhu
Summary: This paper introduces a novel online learning method, HiGrad, which conducts statistical inference for online learning without additional computational cost compared with SGD. The HiGrad procedure splits threads and decorates predictions using a new covariance structure and t-based confidence interval, achieving asymptotically exact coverage probability under certain conditions. Extensive simulation studies and a real data example demonstrate the effectiveness of HiGrad.
JOURNAL OF MACHINE LEARNING RESEARCH
(2023)
Article
Mathematics
Bodo Herzog
Summary: The aim of this article is to establish a stochastic search algorithm for neural networks based on fractional stochastic processes. Fractional stochastic processes, {B-t(H), t = 0}, which generalize a standard Brownian motion, capture different properties in order to simulate real-world phenomena. This approach provides new insights to stochastic gradient descent (SGD) algorithms in machine learning, and convergence properties for fractional stochastic processes are exhibited.
Article
Computer Science, Artificial Intelligence
Alexandre Lemire Paquin, Brahim Chaib-draa, Philippe Giguere
Summary: We provide new generalization bounds for stochastic gradient descent in training classifiers with invariances. Our analysis covers both convex and non-convex cases and is based on the stability framework. We investigate angle-wise stability instead of euclidean stability in weights for training and consider an invariant distance measure for neural networks. Moreover, we utilize on-average stability to obtain a data-dependent quantity in the bound, which proves to be more favorable with larger learning rates in our experiments.
Article
Automation & Control Systems
Yunwen Lei, Ting Hu, Ke Tang
Summary: This paper provides optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions, without the need for bounded subgradient or smoothness assumptions, and stated with high probability. This improvement is achieved through a refined estimate on the norm of SGD iterates based on martingale analysis and concentration inequalities on empirical processes.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Computer Science, Theory & Methods
Jungang Yang, Liyao Xiang, Ruidong Chen, Weiting Li, Baochun Li
Summary: This study introduces a new (ε, δ)-differential privacy mechanism TVG for protecting privacy in tensor-valued queries, with improved utility by applying unimodal differentially-private noise. Experimental results demonstrate that TVG outperforms other state-of-the-art mechanisms in tensor-valued queries.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
(2022)
Article
Mathematics
Huimin Gao, Qingtao Wu, Hongyan Cao, Xuhui Zhao, Junlong Zhu, Mingchuan Zhang
Summary: As a new machine learning technique, federated learning has gained attention for enabling decentralized model training across data silos or edge intelligent devices in the Internet of Things without exchanging local raw data. However, existing methods based on stochastic gradient descent suffer from slow convergence and unstable performance during training.
Article
Computer Science, Artificial Intelligence
Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, Seif Haridi
ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS
(2015)
Article
Computer Science, Software Engineering
Lehel Nyers, Mark Jelasity
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
(2015)
Article
Computer Science, Artificial Intelligence
Istvan Hegedus, Arpad Berta, Levente Kocsis, Andras A. Benczur, Mark Jelasity
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
(2016)
Article
Computer Science, Information Systems
Gabor Danner, Arpad Berta, Istvan Hegeds, Mark Jelasity
SECURITY AND COMMUNICATION NETWORKS
(2018)
Article
Computer Science, Theory & Methods
Istvan Hegedus, Gabor Danner, Mark Jelasity
Summary: Gossip learning is a decentralized alternative to federated learning that does not require an aggregation server or central component. Despite relying on a more basic infrastructure and being less efficient, gossip learning variants perform comparably to federated learning variants in certain scenarios.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Istvan Megyeri, Istvan Hegedus, Mark Jelasity
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
(2020)
Proceedings Paper
Computer Science, Theory & Methods
Zoltan Szabo, Krisztian Teglas, Arpad Berta, Mark Jelasity, Vilmos Bilicki
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, DAIS 2019
(2019)
Proceedings Paper
Computer Science, Theory & Methods
Istvan Hegedus, Gabor Danner, Mark Jelasity
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, DAIS 2019
(2019)
Article
Computer Science, Interdisciplinary Applications
Edward Tremel, Ken Birman, Robert Kleinberg, Mark Jelasity
ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS
(2019)
Proceedings Paper
Computer Science, Hardware & Architecture
Arpad Berta, Mark Jelasity
2017 25TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2017)
(2017)
Proceedings Paper
Computer Science, Theory & Methods
Istvan Hegedus, Mark Jelasity
2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP)
(2016)
Proceedings Paper
Computer Science, Theory & Methods
Arpad Berta, Istvan Hegedus, Mark Jelasity
2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP)
(2016)
Proceedings Paper
Computer Science, Hardware & Architecture
Istvan Hegedus, Mark Jelasity, Levente Kocsis, Andras A. Benczur
14-TH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P)
(2014)
Proceedings Paper
Computer Science, Hardware & Architecture
Arpad Berta, Vilmos Bilicki, Mark Jelasity
14-TH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P)
(2014)
Proceedings Paper
Computer Science, Hardware & Architecture
Roberto Roverso, Jim Dowling, Mark Jelasity
13TH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P)
(2013)