☆ 4.5 Article

Parallel Reproducible Summation

IEEE TRANSACTIONS ON COMPUTERS (2015)

期刊

IEEE TRANSACTIONS ON COMPUTERS

卷 64, 期 7, 页码 2060-2070

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TC.2014.2345391

关键词

Reproducibility; summation; floating-point; rounding error; parallel computing; numerical analysis

类别

Computer Science, Hardware & Architecture Engineering, Electrical & Electronic

资金

National Science Foundation (NSF) [NSF OCI-1032639, NSF ACI-1339676]
US Department of Energy (DOE) [DOE DE-SC0010200, DOE DE-SC0003959, DOE DE-SC0005136, DOE DE-SC0008699, DOE DE-SC0008700, DOE DE-SC0004938, DOE AC02-05CH11231]
DARPA [HR0011-12-2-0016]
Intel
Google
Nokia
NVIDIA
Oracle
Direct For Computer & Info Scie & Enginr [1339676] Funding Source: National Science Foundation
Office of Advanced Cyberinfrastructure (OAC) [1339676] Funding Source: National Science Foundation
U.S. Department of Energy (DOE) [DE-SC0008699] Funding Source: U.S. Department of Energy (DOE)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [10]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [7], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes highly accurate results with an absolute error bound of n . 2(-28) . macheps . max(i) vertical bar v(i)vertical bar at a cost of 7n FLOPs and a small constant amount of extra memory usage. Higher accuracies are also possible by increasing the number of error-free transformations. As long as all operations are performed in to-nearest rounding mode, results computed by the proposed algorithms are reproducible for any run on any platform. In particular, our algorithm requires the minimum number of reductions, i.e. one reduction of an array of six double precision floating point numbers per sum, and hence is well suited for massively parallel environments.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.5

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Toward Accurate and Fast Summation

Marko Lange

Summary: The study introduces a new accurate summation algorithm based on error-free summation, providing faithfully rounded floating-point approximations. The algorithm is compared with other accurate and high-precision summation approaches.

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE (2022)