☆ 4.5 Article

Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

IEEE TRANSACTIONS ON INFORMATION THEORY (2019)

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY

Volume 65, Issue 10, Pages 6171-6193

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIT.2019.2927558

Keywords

Algorithm-based fault tolerance; coded computing; matrix sparsification; stragglers

Funding

Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Center - MARCO
Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Center - DARPA
NSF [1350314, 1464336, 1553248, 1763657]
Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [1350314] Funding Source: National Science Foundation
Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [1464336, 1553248, 1763657] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

We consider the problem of computing a matrix-vector product Ax using a set of P parallel or distributed processing nodes prone to straggling, i.e., unpredictable delays. Every processing node can access only a fraction (s/N) of the N-length vector x, and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call Short-Dot-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute Ax. To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix A, but the length of the dot products computed at each processing node is still N. The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of x to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of x to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal.

Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper