☆ 4.5 Article

Speeding Up Distributed Machine Learning Using Codes

IEEE TRANSACTIONS ON INFORMATION THEORY (2018)

期刊

IEEE TRANSACTIONS ON INFORMATION THEORY

卷 64, 期 3, 页码 1514-1529

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIT.2017.2736066

关键词

Algorithm design and analysis; channel coding; distributed computing; distributed databases; encoding; machine learning algorithms; multicast communication; robustness; runtime

类别

Computer Science, Information Systems Engineering, Electrical & Electronic

资金

Institute for Information & communications Technology Promotion(IITP) - Korea government(MSIT) [20170-00694]
Brain Korea 21 Plus Project
NSF CIF [1703678]
Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [1703678] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling. For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is n, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of log n. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction a of the data matrix can be cached at each worker, and n is the number of workers, coded shuffling reduces the communication cost by a factor of (alpha + 1/n)gamma(n) compared with uncoded shuffling, where gamma(n) is the ratio of the cost of unicasting n messages to n users to multicasting a common message (of the same size) to n users. For instance, gamma(n) similar or equal to n if multicasting a message to n users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.5

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Resource Utilization of Distributed Databases in Edge-Cloud Environment

Yaser Mansouri, Victor Prokhorenko, Faheem Ullah, Muhammad Ali Babar

Summary: This study experiments on various physical and virtualized computing nodes to reveal which database under which offloading scenario is more efficient in terms of energy, bandwidth, and storage consumption in edge-cloud environments.

IEEE INTERNET OF THINGS JOURNAL (2023)