☆ 4.7 Article

MPCA SGD-A Method for Distributed Training of Deep Learning Models on Spark

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2018)

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Volume 29, Issue 11, Pages 2540-2556

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2018.2833074

Keywords

Deep learning; distributed computing; machine learning; neural networks; spark; stochastic gradient descent

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Many distributed deep learning systems have been published over the past few years, often accompanied by impressive performance claims. In practice these figures are often achieved in high performance computing (HPC) environments with fast InfiniBand network connections. For average deep learning practitioners this is usually an unrealistic scenario, since they cannot afford access to these facilities. Simple re-implementations of algorithms such as EASGD [1] for standard Ethernet environments often fail to replicate the scalability and performance of the original works [2]. In this paper, we explore this particular problem domain and present MPCA SGD, a method for distributed training of deep neural networks that is specifically designed to run in low-budget environments. MPCA SGD tries to make the best possible use of available resources, and can operate well if network bandwidth is constrained. Furthermore, MPCA SGD runs on top of the popular Apache Spark [3] framework. Thus, it can easily be deployed in existing data centers and office environments where Spark is already used. When training large deep learning models in a gigabit Ethernet cluster, MPCA SGD achieves significantly faster convergence rates than many popular alternatives. For example, MPCA SGD can train ResNet-152 [4] up to 5.3x faster than state-of-the-art systems like MXNet [5], up to 5.3x faster than bulk-synchronous systems like SparkNet [6] and up to 5.3x faster than decentral asynchronous systems like EASGD [1].

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recent Advances in Stochastic Gradient Descent in Deep Learning

Yingjie Tian, Yuqi Zhang, Haibin Zhang

Summary: In the age of artificial intelligence, finding the best approach to handle massive data is a challenging task. Stochastic gradient descent (SGD) stands out among machine learning models as it is simple yet highly effective. This study examines various contemporary deep learning applications, including natural language processing (NLP), visual data processing, and voice and audio processing. The study also presents different versions of SGD and its variant available in the PyTorch optimizer, such as SGD, Adagrad, adadelta, RMSprop, Adam, AdamW, etc. Additionally, theoretical conditions for the applicability of these methods are proposed, highlighting the existing gap between theoretical convergence and practical implementation.

MATHEMATICS (2023)