☆ 4.4 Article

Investigating power capping toward energy-efficient scientific applications

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2019)

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Volume 31, Issue 6, Pages -

Publisher

WILEY

DOI: 10.1002/cpe.4485

Keywords

energy efficiency; high performance computing; Intel Xeon Phi; Knights landing; PAPI; performance analysis; performance counters; power efficiency

Categories

Computer Science, Software Engineering Computer Science, Theory & Methods

Funding

National Science Foundation NSF [1450429, 1514286]
Exascale Computing Project [17-SC-20-SC]
Direct For Computer & Info Scie & Enginr
Division Of Computer and Network Systems [1514286] Funding Source: National Science Foundation
Direct For Computer & Info Scie & Enginr
Office of Advanced Cyberinfrastructure (OAC) [1450429] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The emergence of power efficiency as a primary constraint in processor and system design poses new challenges concerning power and energy awareness for numerical libraries and scientific applications. Power consumption also plays a major role in the design of data centers, which may house petascale or exascale-level computing systems. At these extreme scales, understanding and improving the energy efficiency of numerical libraries and their related applications becomes a crucial part of the successful implementation and operation of the computing system. In this paper, we study and investigate the practice of controlling a compute system's power usage, and we explore how different power caps affect the performance of numerical algorithms with different computational intensities. Further, we determine the impact, in terms of performance and energy usage, that these caps have on a system running scientific applications. This analysis will enable us to characterize the types of algorithms that benefit most from these power management schemes. Our experiments are performed using a set of representative kernels and several popular scientific benchmarks. We quantify a number of power and performance measurements and draw observations and conclusions that can be viewed as a roadmap to achieving energy efficiency in the design and execution of scientific algorithms.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Information Systems

Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors

Yoosang Park, Raehyun Kim, Thi My Tuyen Nguyen, Jaeyoung Choi

Summary: In this study, an improved parallel double-precision general matrix-matrix multiplication (PDGEMM) routine is proposed to enhance performance and time-cost efficiency for modern Intel computers.

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2023)

Add to Collection

Article Computer Science, Hardware & Architecture

Performance benchmarking of deep learning framework on Intel Xeon Phi

Chao-Tung Yang, Jung-Chun Liu, Yu-Wei Chan, Endah Kristiani, Chan-Fu Kuo

Summary: This paper compares the performance metrics of Caffe and TensorFlow, two deep learning frameworks, in terms of runtime performance and accuracy. The study uses various datasets and examines the impact of specific optimization techniques on the framework's performance. The experimental results demonstrate the benefits of optimizing Xeon Phi for both Caffe and TensorFlow frameworks.

JOURNAL OF SUPERCOMPUTING (2021)

Add to Collection

Article Computer Science, Software Engineering

Performance analysis of parallel high-resolution image restoration algorithms on Intel supercomputer

Ivan Lirkov, Stanislav Harizanov, Marcin Paprzycki, Maria Ganzha

Summary: This article presents an experimental performance study of a parallel implementation of two Poissonian image restoration algorithms, showing significant improvement in execution times when running experiments for a variety of problem sizes and number of threads.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2021)

Add to Collection

Article Computer Science, Hardware & Architecture

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Sergey Malkovsky, Aleksei A. Sorokin, Georgiy Tsoy, Sergey P. Korolev, Sergey Smagin, Vadim A. Kondrashev

Summary: Fast Fourier transform is widely used in scientific and engineering fields, especially in speech and image recognition, signal analysis, and material modeling. Research on the efficiency of discrete Fourier transform computation on new hybrid computing systems has been conducted, providing conclusions on performance and recommendations for optimal operation of mathematical libraries.

JOURNAL OF SUPERCOMPUTING (2021)

Add to Collection

Article Computer Science, Software Engineering

Preliminary study on the automatic parallelism optimization model for image enhancement algorithms based on Intel's® Xeon Phi

Fang Huang, Hao Yang, Jian Tao, Jian Wang, Xicheng Tan

Summary: This research proposes optimization methods for the Par4All algorithm based on high-performance computing platforms, which significantly improve the performance of automatic parallel algorithms, including optimizing the search module, dynamic thread setting, and collaborative parallelization. The optimized algorithm achieves significant acceleration ratio improvements in multiscale Retinex, Gaussian-filtering, and median-filtering algorithms.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2021)

Add to Collection

Article Computer Science, Information Systems

A Multiformalism-Based Model for Performance Evaluation of Green Data Centres

Enrico Barbierato, Daniele Manini, Marco Gribaudo

Summary: Although the coexistence of ARM and INTEL technologies in green data centres is feasible, there are significant challenges due to differences in instruction sets and power consumption. This article presents a multiformalism-based data centre model and reviews performance indices to address issues such as system underutilization and workload balance estimation. The model aims to provide an effective solution for evaluating the performance of hybrid hardware architectures.

ELECTRONICS (2023)

Add to Collection

Article Engineering, Biomedical

Challenges and opportunities for the simulation of calcium waves on modern multi-core and many-core parallel computing platforms

Carlos Barajas, Matthias K. Gobbert, Gerson C. Kroiz, Bradford E. Peercy

Summary: The study compares the performance of the second-generation Intel Xeon Phi with the latest multicore CPU by solving a system of nonlinear reaction-diffusion partial differential equations. The results demonstrate that both hardware platforms exhibit excellent parallel scalability, but for certain problems, modern multicore CPUs outperform the specialized many-core Intel Xeon Phi architecture.

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING (2021)

Add to Collection

Article Computer Science, Hardware & Architecture

High-performance simulations of turbulent boundary layer flow using Intel Xeon Phi many-core processors

Ji-Hoon Kang, Jinyul Hwang, Hyung Jin Sung, Hoon Ryu

Summary: This study examines the feasibility of cost-efficient DNS on Intel Xeon Phi many-core processors and provides a practical guideline for accelerating large-scale and precise DNS. The optimized code is validated through numerical simulations and comparison with previous studies.

JOURNAL OF SUPERCOMPUTING (2021)

Add to Collection

Article Engineering, Aerospace

Landing Performance Study for Four Wheels Twin Tandem Landing Gear Based on Drop Test

Wei Fang, Lingang Zhu, Youshan Wang

Summary: Drop tests were conducted on a twin tandem landing gear with different filling parameters, showing its ability to absorb landing impact. The pitch damper only absorbs a small portion of the pitching kinetic energy during tail-down landing. Furthermore, the orifice diameter has little effect on the axial load, while the pressure can affect the vibration attenuation.

AEROSPACE (2022)

Add to Collection

Article Computer Science, Software Engineering

Strategies and software support for the management of hardware performance counters

Stefano Carna, Romolo Marotta, Alessandro Pellegrini, Francesco Quaglia

Summary: Hardware performance counters (HPCs) are essential for post-mortem performance profiling and are now being repurposed for activities such as self-tuning and system inspection. This article discusses practical strategies to exploit HPCs beyond post-mortem profiling, suitable for different application contexts. It also provides a general primer on HPCs usage on Linux and presents an experimental assessment of the viability of the proposed strategies.

SOFTWARE-PRACTICE & EXPERIENCE (2023)

Add to Collection

Article Computer Science, Theory & Methods

Bi-Objective Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Performance and Energy Through Workload Distribution

Hamidreza Khaleghzadeh, Muhammad Fahad, Arsalan Shahid, Ravi Reddy Manumachu, Alexey Lastovetsky

Summary: This article investigates bi-objective optimization on heterogeneous processors, proposing a new solution method that accurately models resource contention and NUMA in modern parallel platforms to return Pareto-optimal solutions. Experimental analysis shows that the method determines a superior Pareto front containing both load balanced and load imbalanced solutions.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2021)

Add to Collection

Article Computer Science, Cybernetics

Large-Scale Parallelization of Human Migration Simulation

Derek Groen, Nikela Papadopoulou, Petros Anastasiadis, Marcin Lawenda, Lukasz Szustak, Sergiy Gogolenko, Hamid Arabnejad, Alireza Jahani

Summary: Forced displacement, particularly due to violent conflicts, is widespread globally, with a current estimate of over 82 million people being forcibly displaced. This has made migration a critical issue for humanity. The Flee simulation code is an agent-based modeling tool that can accurately predict population displacements in civil war scenarios but requires significant computational power.

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS (2023)

Add to Collection

Review Green & Sustainable Science & Technology

A review on the decarbonization of high-performance computing centers

C. A. Silva, R. Vilaca, A. Pereira, R. J. Bessa

Summary: The decarbonization of high-performance computing centers is crucial for improving their environmental and financial performance. State-of-the-art supercomputers are growing in computing power and evolving in terms of both energy and information technology infrastructure. Collaboration between policy and energy sectors can generate new revenue streams and facilitate integration of these centers into energy systems.

RENEWABLE & SUSTAINABLE ENERGY REVIEWS (2024)

Add to Collection

Article Engineering, Aerospace

Crashworthiness Performance Simulation and Analysis of Combined-Type Landing Gear Buffer

Xingbo Fang, Hu Chen, Yuying Han, Xinhong Xie, Xiaohui Wei, Hong Nie

Summary: This paper proposes a new design for landing gear buffers by combining oleo-pneumatic buffers with expansion tubes to enhance crashworthiness. Numerical and finite element analyses were conducted to simulate the crushing process of the buffers at crash speed, and the buffer force-displacement curve was compared with a theoretical model. The results demonstrate that the combined-type buffer exhibits high efficiency and superior performance.

JOURNAL OF AEROSPACE ENGINEERING (2022)

Add to Collection

Proceedings Paper Computer Science, Hardware & Architecture

Supporting RISC-V Performance Counters Through Linux Performance Analysis Tools

Joao Mario Domingos, Tiago Rocha, Nuno Neves, Nuno Roma, Pedro Tomas, Leonel Sousa

Summary: Increased attention to RISC-V open Instruction Set Architecture has led to its expansion into high-performance computing. However, the lack of powerful performance monitoring tools results in suboptimal applications. This paper proposes extensions and modifications to address this limitation and achieve full support for RISC-V performance monitoring.

2023 IEEE 34TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP (2023)

Add to Collection

Article Computer Science, Interdisciplinary Applications

Translational process: Mathematical software perspective

Jack Dongarra, Mark Gates, Piotr Luszczek, Stanimire Tomov

Summary: Each generation of computer architecture brings new challenges for high performance mathematical solvers, requiring the development and analysis of new algorithms embodied in software libraries. These libraries hide architectural details, allowing applications to be portable across platforms, fostering the development and distribution of algorithms.

JOURNAL OF COMPUTATIONAL SCIENCE (2021)

Add to Collection

Article Computer Science, Theory & Methods

Accelerating Restarted GMRES With Mixed Precision Arithmetic

Neil Lindquist, Piotr Luszczek, Jack Dongarra

Summary: GMRES is an iterative Krylov solver for sparse, non-symmetric linear equations, where data movement dominates run time. Running GMRES in reduced precision while keeping key operations in full precision improves performance. The mixed-precision approach achieved speedups ranging from 8 to 61% on a GPU-accelerated node, with simpler preconditioners showing higher speedups.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Add to Collection

Article Computer Science, Theory & Methods

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC

Sameh Abdulah, Qinglei Cao, Yu Pei, George Bosilca, Jack Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, Ying Sun

Summary: Geostatistical modeling is a technique to predict geographically distributed data based on statistical models and optimization. By reducing precision and utilizing mathematical structure, the efficiency of Gaussian maximum log-likelihood estimation can be improved. The use of precise mathematics and dynamic runtime software allows for improved performance while maintaining accuracy.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Add to Collection

Article Mathematics, Applied

Block Gram-Schmidt algorithms and their stability properties

Erin Carson, Kathryn Lund, Miroslav Rozlonik, Stephen Thomas

Summary: This work provides a comprehensive categorization and stability analysis of block Gram-Schmidt algorithms, and derives new variants. The efficacy and stability of these algorithms are demonstrated through numerical examples. Additionally, the implementations in popular software packages and some open problems are discussed.

LINEAR ALGEBRA AND ITS APPLICATIONS (2022)

Add to Collection

Article Computer Science, Theory & Methods

Using long vector extensions for MPI reductions

Dong Zhong, Qinglei Cao, George Bosilca, Jack Dongarra

Summary: The design of modern CPUs has a greater impact on algorithm efficiency than recent modest frequency increases, with vectorization becoming a critical software component. This paper investigates the impact of vectorizing MPI reduction operations to improve time-to-solution and achieve efficiency on multiple architectures. Experiments show that the proposed vector extension optimized reduction operations significantly reduce completion time for collective communication reductions and benefit overall cost and efficiency.

PARALLEL COMPUTING (2022)

Add to Collection

Article Computer Science, Theory & Methods

Evaluating Data Redistribution in PaRSEC

Qinglei Cao, George Bosilca, Nuria Losada, Wei Wu, Dong Zhong, Jack Dongarra

Summary: Data redistribution aims to optimize algorithms by reshuffling data, resulting in increased efficiency and reduced time-to-solution. This problem focuses on optimizing communication scheduling and considering factors such as data size. Task-based runtime systems provide a potential solution to address the complexity.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Add to Collection

Editorial Material Computer Science, Interdisciplinary Applications

Computational science for a better future

Sergey Kovalchuk, Valeria V. Krzhizhanovskaya, Maciej Paszynski, Dieter Kranzlmuller, Jack Dongarra, Peter M. A. Sloot

JOURNAL OF COMPUTATIONAL SCIENCE (2022)

Add to Collection

Article Computer Science, Hardware & Architecture

The Evolution of Mathematical Software

Jack J. Dongarra

Summary: Computational modeling and simulation, introduced as a new branch of scientific methodology over four decades ago, have embodied enthusiasm and vision and have been widely applied and developed.

COMMUNICATIONS OF THE ACM (2022)

Add to Collection

Article Computer Science, Hardware & Architecture

HPC Forecast: Cloudy and Uncertain

Daniel Reed, Dennis Gannon, Jack Dongarra

Summary: This article examines the changes in the technology landscape and potential future directions for HPC operations and innovation.

COMMUNICATIONS OF THE ACM (2023)

Add to Collection

Proceedings Paper Computer Science, Hardware & Architecture

Deep Gaussian process with multitask and transfer learning for performance optimization

Wissam M. Sid-Lakhdar, Mohsen Aznaveh, Piotr Luszczek, Jack Dongarra

Summary: This paper combines Deep Gaussian Processes with multitask and transfer learning for the performance modeling and optimization of HPC applications, and demonstrates the advantage of this approach through comparison with state-of-the-art autotuners on two application problems.

2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC) (2022)

Add to Collection

Proceedings Paper Computer Science, Hardware & Architecture

A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization

Qinglei Cao, Rabab Alomairy, Yu Pei, George Bosilca, Hatem Ltaief, David Keyes, Jack Dongarra

Summary: The paper presents a general framework that combines the PaRSEC runtime system and the HiCMA numerical library to solve 3D data-sparse problems. The framework utilizes a tile low-rank approximation method to reduce the memory footprint and algorithmic complexity. Experimental results demonstrate the significant performance improvement achieved by the proposed framework on different high-performance supercomputers.

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022) (2022)

Add to Collection

Proceedings Paper Computer Science, Hardware & Architecture

Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs

Sebastien Cayrols, Jiali Li, George Bosilca, Stanimire Tomov, Alan Ayala, Jack Dongarra

Summary: In this paper, the authors tackle the challenge of communication in parallel applications by using advanced MPI features and data compression. They also design an approximate FFT algorithm to optimize the speed and accuracy of 3D FFTs.

2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022) (2022)

Add to Collection

Proceedings Paper Computer Science, Interdisciplinary Applications

Batch QR Factorization on GPUs: Design, Optimization, and Tuning

Ahmad Abdelfattah, Stan Tomov, Jack Dongarra

Summary: QR factorization of dense matrices is a crucial tool in HPC, and its impact extends to various applications. The authors developed a high performance batch QR factorization method for GPUs, achieving significant speedups compared to state-of-the-art libraries on the latest GPU architectures.

COMPUTATIONAL SCIENCE - ICCS 2022, PT I (2022)

Add to Collection

Proceedings Paper Computer Science, Hardware & Architecture

Performance Analysis of Parallel FFT on Large Multi-GPU Systems

Alan Ayala, Stan Tomov, Miroslav Stoyanov, Azzam Haidar, Jack Dongarra

Summary: This paper presents a performance study of multidimensional Fast Fourier Transforms (FFT) with GPU accelerators on modern hybrid architectures, evaluating the computational costs and communication bottleneck. A tuning methodology is used to accelerate the FFT computation and reduce communication costs, achieving linear scalability on large-scale systems. The importance of carefully tuning the algorithm is demonstrated for FFT-dependent applications.

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022) (2022)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.