3.9 Article Proceedings Paper

Fast Tridiagonal Solvers on the GPU

Journal

ACM SIGPLAN NOTICES
Volume 45, Issue 5, Pages 127-136

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/1837853.1693472

Keywords

Algorithms; Measurement; Performance; Tridiagonal Linear System; GPGPU; Performance Optimization

Funding

  1. HP Labs Innovation
  2. National Science Foundation [0541448]
  3. SciDAC Institute for Ultrascale Visualization
  4. NVIDIA
  5. Division of Computing and Communication Foundations
  6. Direct For Computer & Info Scie & Enginr [0541448] Funding Source: National Science Foundation

Ask authors/readers for more resources

We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead. We find that CR enjoys linear algorithm complexity but suffers from more algorithmic steps and bank conflicts, while PCR and RD have fewer algorithmic steps but do more work each step. To combine the benefits of the basic algorithms, we propose hybrid CR+PCR and CR+RD algorithms, which improve the performance of PCR, RD and CR by 21%, 31% and 61% respectively. Our GPU solvers achieve up to a 28x speedup over a sequential LAPACK solver, and a 12x speedup over a multi-threaded CPU solver.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.9
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available