☆ 4.7 Article

Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2019)

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Volume 30, Issue 4, Pages 923-938

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2018.2871189

Keywords

Heterogeneous many-core processor; parallelism; performance analysis; performance-aware; SpGEMM; Sunway TaihuLight supercomputer

Funding

National Key R&D Program of China [2016YFB0200201]
National Outstanding Youth Science Program of National Natural Science Foundation of China [61625202]
International (Regional) Cooperation and Exchange Program of National Natural Science Foundation of China [61661146006, 61860206011]
Program of National Natural Science Foundation of China [61751204, 61806077]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Implementation and optimization of ChaCha20 stream cipher on sunway taihuLight supercomputer

Weilin Cai, Heng Chen, Ziheng Wang, Xingjun Zhang

Summary: Data has always been the most valuable asset for enterprises and research institutions and protecting the confidentiality of data is crucial. To improve the efficiency of large-scale data encryption and decryption, a parallel version of the ChaCha20 stream cipher, optimized for the SW26010 heterogeneous multi-core processor on the Sunway TaihuLight supercomputer, was implemented. Multiple optimization methods were used to achieve a maximum throughput of 32.43 GB/s on a single SW26010 processor and good scalability up to 8296.43 GB/s on 1024 core groups.

JOURNAL OF SUPERCOMPUTING (2022)