4.4 Article

Investigating power capping toward energy-efficient scientific applications

Journal

Publisher

WILEY
DOI: 10.1002/cpe.4485

Keywords

energy efficiency; high performance computing; Intel Xeon Phi; Knights landing; PAPI; performance analysis; performance counters; power efficiency

Funding

  1. National Science Foundation NSF [1450429, 1514286]
  2. Exascale Computing Project [17-SC-20-SC]
  3. Direct For Computer & Info Scie & Enginr
  4. Division Of Computer and Network Systems [1514286] Funding Source: National Science Foundation
  5. Direct For Computer & Info Scie & Enginr
  6. Office of Advanced Cyberinfrastructure (OAC) [1450429] Funding Source: National Science Foundation

Ask authors/readers for more resources

The emergence of power efficiency as a primary constraint in processor and system design poses new challenges concerning power and energy awareness for numerical libraries and scientific applications. Power consumption also plays a major role in the design of data centers, which may house petascale or exascale-level computing systems. At these extreme scales, understanding and improving the energy efficiency of numerical libraries and their related applications becomes a crucial part of the successful implementation and operation of the computing system. In this paper, we study and investigate the practice of controlling a compute system's power usage, and we explore how different power caps affect the performance of numerical algorithms with different computational intensities. Further, we determine the impact, in terms of performance and energy usage, that these caps have on a system running scientific applications. This analysis will enable us to characterize the types of algorithms that benefit most from these power management schemes. Our experiments are performed using a set of representative kernels and several popular scientific benchmarks. We quantify a number of power and performance measurements and draw observations and conclusions that can be viewed as a roadmap to achieving energy efficiency in the design and execution of scientific algorithms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Interdisciplinary Applications

Translational process: Mathematical software perspective

Jack Dongarra, Mark Gates, Piotr Luszczek, Stanimire Tomov

Summary: Each generation of computer architecture brings new challenges for high performance mathematical solvers, requiring the development and analysis of new algorithms embodied in software libraries. These libraries hide architectural details, allowing applications to be portable across platforms, fostering the development and distribution of algorithms.

JOURNAL OF COMPUTATIONAL SCIENCE (2021)

Article Computer Science, Theory & Methods

Accelerating Restarted GMRES With Mixed Precision Arithmetic

Neil Lindquist, Piotr Luszczek, Jack Dongarra

Summary: GMRES is an iterative Krylov solver for sparse, non-symmetric linear equations, where data movement dominates run time. Running GMRES in reduced precision while keeping key operations in full precision improves performance. The mixed-precision approach achieved speedups ranging from 8 to 61% on a GPU-accelerated node, with simpler preconditioners showing higher speedups.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Computer Science, Theory & Methods

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC

Sameh Abdulah, Qinglei Cao, Yu Pei, George Bosilca, Jack Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, Ying Sun

Summary: Geostatistical modeling is a technique to predict geographically distributed data based on statistical models and optimization. By reducing precision and utilizing mathematical structure, the efficiency of Gaussian maximum log-likelihood estimation can be improved. The use of precise mathematics and dynamic runtime software allows for improved performance while maintaining accuracy.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Mathematics, Applied

Block Gram-Schmidt algorithms and their stability properties

Erin Carson, Kathryn Lund, Miroslav Rozlonik, Stephen Thomas

Summary: This work provides a comprehensive categorization and stability analysis of block Gram-Schmidt algorithms, and derives new variants. The efficacy and stability of these algorithms are demonstrated through numerical examples. Additionally, the implementations in popular software packages and some open problems are discussed.

LINEAR ALGEBRA AND ITS APPLICATIONS (2022)

Article Computer Science, Theory & Methods

Using long vector extensions for MPI reductions

Dong Zhong, Qinglei Cao, George Bosilca, Jack Dongarra

Summary: The design of modern CPUs has a greater impact on algorithm efficiency than recent modest frequency increases, with vectorization becoming a critical software component. This paper investigates the impact of vectorizing MPI reduction operations to improve time-to-solution and achieve efficiency on multiple architectures. Experiments show that the proposed vector extension optimized reduction operations significantly reduce completion time for collective communication reductions and benefit overall cost and efficiency.

PARALLEL COMPUTING (2022)

Article Computer Science, Theory & Methods

Evaluating Data Redistribution in PaRSEC

Qinglei Cao, George Bosilca, Nuria Losada, Wei Wu, Dong Zhong, Jack Dongarra

Summary: Data redistribution aims to optimize algorithms by reshuffling data, resulting in increased efficiency and reduced time-to-solution. This problem focuses on optimizing communication scheduling and considering factors such as data size. Task-based runtime systems provide a potential solution to address the complexity.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Editorial Material Computer Science, Interdisciplinary Applications

Computational science for a better future

Sergey Kovalchuk, Valeria V. Krzhizhanovskaya, Maciej Paszynski, Dieter Kranzlmuller, Jack Dongarra, Peter M. A. Sloot

JOURNAL OF COMPUTATIONAL SCIENCE (2022)

Article Computer Science, Hardware & Architecture

The Evolution of Mathematical Software

Jack J. Dongarra

Summary: Computational modeling and simulation, introduced as a new branch of scientific methodology over four decades ago, have embodied enthusiasm and vision and have been widely applied and developed.

COMMUNICATIONS OF THE ACM (2022)

Article Computer Science, Hardware & Architecture

HPC Forecast: Cloudy and Uncertain

Daniel Reed, Dennis Gannon, Jack Dongarra

Summary: This article examines the changes in the technology landscape and potential future directions for HPC operations and innovation.

COMMUNICATIONS OF THE ACM (2023)

Proceedings Paper Computer Science, Hardware & Architecture

Deep Gaussian process with multitask and transfer learning for performance optimization

Wissam M. Sid-Lakhdar, Mohsen Aznaveh, Piotr Luszczek, Jack Dongarra

Summary: This paper combines Deep Gaussian Processes with multitask and transfer learning for the performance modeling and optimization of HPC applications, and demonstrates the advantage of this approach through comparison with state-of-the-art autotuners on two application problems.

2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC) (2022)

Proceedings Paper Computer Science, Hardware & Architecture

A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization

Qinglei Cao, Rabab Alomairy, Yu Pei, George Bosilca, Hatem Ltaief, David Keyes, Jack Dongarra

Summary: The paper presents a general framework that combines the PaRSEC runtime system and the HiCMA numerical library to solve 3D data-sparse problems. The framework utilizes a tile low-rank approximation method to reduce the memory footprint and algorithmic complexity. Experimental results demonstrate the significant performance improvement achieved by the proposed framework on different high-performance supercomputers.

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022) (2022)

Proceedings Paper Computer Science, Hardware & Architecture

Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs

Sebastien Cayrols, Jiali Li, George Bosilca, Stanimire Tomov, Alan Ayala, Jack Dongarra

Summary: In this paper, the authors tackle the challenge of communication in parallel applications by using advanced MPI features and data compression. They also design an approximate FFT algorithm to optimize the speed and accuracy of 3D FFTs.

2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022) (2022)

Proceedings Paper Computer Science, Interdisciplinary Applications

Batch QR Factorization on GPUs: Design, Optimization, and Tuning

Ahmad Abdelfattah, Stan Tomov, Jack Dongarra

Summary: QR factorization of dense matrices is a crucial tool in HPC, and its impact extends to various applications. The authors developed a high performance batch QR factorization method for GPUs, achieving significant speedups compared to state-of-the-art libraries on the latest GPU architectures.

COMPUTATIONAL SCIENCE - ICCS 2022, PT I (2022)

Proceedings Paper Computer Science, Hardware & Architecture

Performance Analysis of Parallel FFT on Large Multi-GPU Systems

Alan Ayala, Stan Tomov, Miroslav Stoyanov, Azzam Haidar, Jack Dongarra

Summary: This paper presents a performance study of multidimensional Fast Fourier Transforms (FFT) with GPU accelerators on modern hybrid architectures, evaluating the computational costs and communication bottleneck. A tuning methodology is used to accelerate the FFT computation and reduce communication costs, achieving linear scalability on large-scale systems. The importance of carefully tuning the algorithm is demonstrated for FFT-dependent applications.

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022) (2022)

No Data Available