4.5 Article

Parallel Reproducible Summation

期刊

IEEE TRANSACTIONS ON COMPUTERS
卷 64, 期 7, 页码 2060-2070

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TC.2014.2345391

关键词

Reproducibility; summation; floating-point; rounding error; parallel computing; numerical analysis

资金

  1. National Science Foundation (NSF) [NSF OCI-1032639, NSF ACI-1339676]
  2. US Department of Energy (DOE) [DOE DE-SC0010200, DOE DE-SC0003959, DOE DE-SC0005136, DOE DE-SC0008699, DOE DE-SC0008700, DOE DE-SC0004938, DOE AC02-05CH11231]
  3. DARPA [HR0011-12-2-0016]
  4. Intel
  5. Google
  6. Nokia
  7. NVIDIA
  8. Oracle
  9. Direct For Computer & Info Scie & Enginr [1339676] Funding Source: National Science Foundation
  10. Office of Advanced Cyberinfrastructure (OAC) [1339676] Funding Source: National Science Foundation
  11. U.S. Department of Energy (DOE) [DE-SC0008699] Funding Source: U.S. Department of Energy (DOE)

向作者/读者索取更多资源

Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [10]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [7], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes highly accurate results with an absolute error bound of n . 2(-28) . macheps . max(i) vertical bar v(i)vertical bar at a cost of 7n FLOPs and a small constant amount of extra memory usage. Higher accuracies are also possible by increasing the number of error-free transformations. As long as all operations are performed in to-nearest rounding mode, results computed by the proposed algorithms are reproducible for any run on any platform. In particular, our algorithm requires the minimum number of reductions, i.e. one reduction of an array of six double precision floating point numbers per sum, and hence is well suited for massively parallel environments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据