期刊
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS
卷 72, 期 2, 页码 259-268出版社
WILEY-BLACKWELL
DOI: 10.1002/fld.3744
关键词
Eulerian; finite element; partial differential equations; compressible flow; parallelization; explicit
We describe the performance of Chicoma, a 3D unstructured mesh compressible flow solver, on graphics processing unit (GPU) hardware. The approach used to deploy the solver on GPU architectures derives from the threaded multicore execution model used in Chicoma, and attempts to improve memory performance via the application of graph theory techniques. The result is a scheme that can be deployed on the GPU with high-level programming constructs, for example, compiler directives, rather than low-level programming extensions. With an NVIDIA Fermi-class GPU (NVIDIA Corp., Sta. Clara, CA, USA) and double precision floating point arithmetic, we observe performance gains of 45xon problem sizes of 106 107 tetrahedra. We also compare GPU performance to threaded multicore performance with OpenMP and demonstrate hybrid multicore-GPU calculations with adaptive mesh refinement. Published 2012. This article is a US Government work and is in the public domain in the USA.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据