4.7 Article

Importance of explicit vectorization for CPU and GPU software performance

Journal

JOURNAL OF COMPUTATIONAL PHYSICS
Volume 230, Issue 13, Pages 5383-5398

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jcp.2011.03.041

Keywords

Performance; Optimization; Vectorization; Monte Carlo; Ising model; GPU

Ask authors/readers for more resources

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (CPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and CPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the CPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized CPU version, indicating the importance of optimizing CPU implementations. (C) 2011 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available