Article
Computer Science, Information Systems
Yoosang Park, Raehyun Kim, Thi My Tuyen Nguyen, Jaeyoung Choi
Summary: In this study, an improved parallel double-precision general matrix-matrix multiplication (PDGEMM) routine is proposed to enhance performance and time-cost efficiency for modern Intel computers.
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
(2023)
Article
Computer Science, Hardware & Architecture
Chao-Tung Yang, Jung-Chun Liu, Yu-Wei Chan, Endah Kristiani, Chan-Fu Kuo
Summary: This paper compares the performance metrics of Caffe and TensorFlow, two deep learning frameworks, in terms of runtime performance and accuracy. The study uses various datasets and examines the impact of specific optimization techniques on the framework's performance. The experimental results demonstrate the benefits of optimizing Xeon Phi for both Caffe and TensorFlow frameworks.
JOURNAL OF SUPERCOMPUTING
(2021)
Article
Computer Science, Software Engineering
Ivan Lirkov, Stanislav Harizanov, Marcin Paprzycki, Maria Ganzha
Summary: This article presents an experimental performance study of a parallel implementation of two Poissonian image restoration algorithms, showing significant improvement in execution times when running experiments for a variety of problem sizes and number of threads.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
(2021)
Article
Computer Science, Hardware & Architecture
Sergey Malkovsky, Aleksei A. Sorokin, Georgiy Tsoy, Sergey P. Korolev, Sergey Smagin, Vadim A. Kondrashev
Summary: Fast Fourier transform is widely used in scientific and engineering fields, especially in speech and image recognition, signal analysis, and material modeling. Research on the efficiency of discrete Fourier transform computation on new hybrid computing systems has been conducted, providing conclusions on performance and recommendations for optimal operation of mathematical libraries.
JOURNAL OF SUPERCOMPUTING
(2021)
Article
Computer Science, Software Engineering
Fang Huang, Hao Yang, Jian Tao, Jian Wang, Xicheng Tan
Summary: This research proposes optimization methods for the Par4All algorithm based on high-performance computing platforms, which significantly improve the performance of automatic parallel algorithms, including optimizing the search module, dynamic thread setting, and collaborative parallelization. The optimized algorithm achieves significant acceleration ratio improvements in multiscale Retinex, Gaussian-filtering, and median-filtering algorithms.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
(2021)
Article
Computer Science, Information Systems
Enrico Barbierato, Daniele Manini, Marco Gribaudo
Summary: Although the coexistence of ARM and INTEL technologies in green data centres is feasible, there are significant challenges due to differences in instruction sets and power consumption. This article presents a multiformalism-based data centre model and reviews performance indices to address issues such as system underutilization and workload balance estimation. The model aims to provide an effective solution for evaluating the performance of hybrid hardware architectures.
Article
Engineering, Biomedical
Carlos Barajas, Matthias K. Gobbert, Gerson C. Kroiz, Bradford E. Peercy
Summary: The study compares the performance of the second-generation Intel Xeon Phi with the latest multicore CPU by solving a system of nonlinear reaction-diffusion partial differential equations. The results demonstrate that both hardware platforms exhibit excellent parallel scalability, but for certain problems, modern multicore CPUs outperform the specialized many-core Intel Xeon Phi architecture.
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING
(2021)
Article
Computer Science, Hardware & Architecture
Ji-Hoon Kang, Jinyul Hwang, Hyung Jin Sung, Hoon Ryu
Summary: This study examines the feasibility of cost-efficient DNS on Intel Xeon Phi many-core processors and provides a practical guideline for accelerating large-scale and precise DNS. The optimized code is validated through numerical simulations and comparison with previous studies.
JOURNAL OF SUPERCOMPUTING
(2021)
Article
Engineering, Aerospace
Wei Fang, Lingang Zhu, Youshan Wang
Summary: Drop tests were conducted on a twin tandem landing gear with different filling parameters, showing its ability to absorb landing impact. The pitch damper only absorbs a small portion of the pitching kinetic energy during tail-down landing. Furthermore, the orifice diameter has little effect on the axial load, while the pressure can affect the vibration attenuation.
Article
Computer Science, Software Engineering
Stefano Carna, Romolo Marotta, Alessandro Pellegrini, Francesco Quaglia
Summary: Hardware performance counters (HPCs) are essential for post-mortem performance profiling and are now being repurposed for activities such as self-tuning and system inspection. This article discusses practical strategies to exploit HPCs beyond post-mortem profiling, suitable for different application contexts. It also provides a general primer on HPCs usage on Linux and presents an experimental assessment of the viability of the proposed strategies.
SOFTWARE-PRACTICE & EXPERIENCE
(2023)
Article
Computer Science, Theory & Methods
Hamidreza Khaleghzadeh, Muhammad Fahad, Arsalan Shahid, Ravi Reddy Manumachu, Alexey Lastovetsky
Summary: This article investigates bi-objective optimization on heterogeneous processors, proposing a new solution method that accurately models resource contention and NUMA in modern parallel platforms to return Pareto-optimal solutions. Experimental analysis shows that the method determines a superior Pareto front containing both load balanced and load imbalanced solutions.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Computer Science, Cybernetics
Derek Groen, Nikela Papadopoulou, Petros Anastasiadis, Marcin Lawenda, Lukasz Szustak, Sergiy Gogolenko, Hamid Arabnejad, Alireza Jahani
Summary: Forced displacement, particularly due to violent conflicts, is widespread globally, with a current estimate of over 82 million people being forcibly displaced. This has made migration a critical issue for humanity. The Flee simulation code is an agent-based modeling tool that can accurately predict population displacements in civil war scenarios but requires significant computational power.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS
(2023)
Review
Green & Sustainable Science & Technology
C. A. Silva, R. Vilaca, A. Pereira, R. J. Bessa
Summary: The decarbonization of high-performance computing centers is crucial for improving their environmental and financial performance. State-of-the-art supercomputers are growing in computing power and evolving in terms of both energy and information technology infrastructure. Collaboration between policy and energy sectors can generate new revenue streams and facilitate integration of these centers into energy systems.
RENEWABLE & SUSTAINABLE ENERGY REVIEWS
(2024)
Article
Engineering, Aerospace
Xingbo Fang, Hu Chen, Yuying Han, Xinhong Xie, Xiaohui Wei, Hong Nie
Summary: This paper proposes a new design for landing gear buffers by combining oleo-pneumatic buffers with expansion tubes to enhance crashworthiness. Numerical and finite element analyses were conducted to simulate the crushing process of the buffers at crash speed, and the buffer force-displacement curve was compared with a theoretical model. The results demonstrate that the combined-type buffer exhibits high efficiency and superior performance.
JOURNAL OF AEROSPACE ENGINEERING
(2022)
Proceedings Paper
Computer Science, Hardware & Architecture
Joao Mario Domingos, Tiago Rocha, Nuno Neves, Nuno Roma, Pedro Tomas, Leonel Sousa
Summary: Increased attention to RISC-V open Instruction Set Architecture has led to its expansion into high-performance computing. However, the lack of powerful performance monitoring tools results in suboptimal applications. This paper proposes extensions and modifications to address this limitation and achieve full support for RISC-V performance monitoring.
2023 IEEE 34TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP
(2023)
Article
Computer Science, Interdisciplinary Applications
Jack Dongarra, Mark Gates, Piotr Luszczek, Stanimire Tomov
Summary: Each generation of computer architecture brings new challenges for high performance mathematical solvers, requiring the development and analysis of new algorithms embodied in software libraries. These libraries hide architectural details, allowing applications to be portable across platforms, fostering the development and distribution of algorithms.
JOURNAL OF COMPUTATIONAL SCIENCE
(2021)
Article
Computer Science, Theory & Methods
Neil Lindquist, Piotr Luszczek, Jack Dongarra
Summary: GMRES is an iterative Krylov solver for sparse, non-symmetric linear equations, where data movement dominates run time. Running GMRES in reduced precision while keeping key operations in full precision improves performance. The mixed-precision approach achieved speedups ranging from 8 to 61% on a GPU-accelerated node, with simpler preconditioners showing higher speedups.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2022)
Article
Computer Science, Theory & Methods
Sameh Abdulah, Qinglei Cao, Yu Pei, George Bosilca, Jack Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, Ying Sun
Summary: Geostatistical modeling is a technique to predict geographically distributed data based on statistical models and optimization. By reducing precision and utilizing mathematical structure, the efficiency of Gaussian maximum log-likelihood estimation can be improved. The use of precise mathematics and dynamic runtime software allows for improved performance while maintaining accuracy.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2022)
Article
Mathematics, Applied
Erin Carson, Kathryn Lund, Miroslav Rozlonik, Stephen Thomas
Summary: This work provides a comprehensive categorization and stability analysis of block Gram-Schmidt algorithms, and derives new variants. The efficacy and stability of these algorithms are demonstrated through numerical examples. Additionally, the implementations in popular software packages and some open problems are discussed.
LINEAR ALGEBRA AND ITS APPLICATIONS
(2022)
Article
Computer Science, Theory & Methods
Dong Zhong, Qinglei Cao, George Bosilca, Jack Dongarra
Summary: The design of modern CPUs has a greater impact on algorithm efficiency than recent modest frequency increases, with vectorization becoming a critical software component. This paper investigates the impact of vectorizing MPI reduction operations to improve time-to-solution and achieve efficiency on multiple architectures. Experiments show that the proposed vector extension optimized reduction operations significantly reduce completion time for collective communication reductions and benefit overall cost and efficiency.
PARALLEL COMPUTING
(2022)
Article
Computer Science, Theory & Methods
Qinglei Cao, George Bosilca, Nuria Losada, Wei Wu, Dong Zhong, Jack Dongarra
Summary: Data redistribution aims to optimize algorithms by reshuffling data, resulting in increased efficiency and reduced time-to-solution. This problem focuses on optimizing communication scheduling and considering factors such as data size. Task-based runtime systems provide a potential solution to address the complexity.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2022)
Editorial Material
Computer Science, Interdisciplinary Applications
Sergey Kovalchuk, Valeria V. Krzhizhanovskaya, Maciej Paszynski, Dieter Kranzlmuller, Jack Dongarra, Peter M. A. Sloot
JOURNAL OF COMPUTATIONAL SCIENCE
(2022)
Article
Computer Science, Hardware & Architecture
Jack J. Dongarra
Summary: Computational modeling and simulation, introduced as a new branch of scientific methodology over four decades ago, have embodied enthusiasm and vision and have been widely applied and developed.
COMMUNICATIONS OF THE ACM
(2022)
Article
Computer Science, Hardware & Architecture
Daniel Reed, Dennis Gannon, Jack Dongarra
Summary: This article examines the changes in the technology landscape and potential future directions for HPC operations and innovation.
COMMUNICATIONS OF THE ACM
(2023)
Proceedings Paper
Computer Science, Hardware & Architecture
Wissam M. Sid-Lakhdar, Mohsen Aznaveh, Piotr Luszczek, Jack Dongarra
Summary: This paper combines Deep Gaussian Processes with multitask and transfer learning for the performance modeling and optimization of HPC applications, and demonstrates the advantage of this approach through comparison with state-of-the-art autotuners on two application problems.
2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC)
(2022)
Proceedings Paper
Computer Science, Hardware & Architecture
Qinglei Cao, Rabab Alomairy, Yu Pei, George Bosilca, Hatem Ltaief, David Keyes, Jack Dongarra
Summary: The paper presents a general framework that combines the PaRSEC runtime system and the HiCMA numerical library to solve 3D data-sparse problems. The framework utilizes a tile low-rank approximation method to reduce the memory footprint and algorithmic complexity. Experimental results demonstrate the significant performance improvement achieved by the proposed framework on different high-performance supercomputers.
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022)
(2022)
Proceedings Paper
Computer Science, Hardware & Architecture
Sebastien Cayrols, Jiali Li, George Bosilca, Stanimire Tomov, Alan Ayala, Jack Dongarra
Summary: In this paper, the authors tackle the challenge of communication in parallel applications by using advanced MPI features and data compression. They also design an approximate FFT algorithm to optimize the speed and accuracy of 3D FFTs.
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022)
(2022)
Proceedings Paper
Computer Science, Interdisciplinary Applications
Ahmad Abdelfattah, Stan Tomov, Jack Dongarra
Summary: QR factorization of dense matrices is a crucial tool in HPC, and its impact extends to various applications. The authors developed a high performance batch QR factorization method for GPUs, achieving significant speedups compared to state-of-the-art libraries on the latest GPU architectures.
COMPUTATIONAL SCIENCE - ICCS 2022, PT I
(2022)
Proceedings Paper
Computer Science, Hardware & Architecture
Alan Ayala, Stan Tomov, Miroslav Stoyanov, Azzam Haidar, Jack Dongarra
Summary: This paper presents a performance study of multidimensional Fast Fourier Transforms (FFT) with GPU accelerators on modern hybrid architectures, evaluating the computational costs and communication bottleneck. A tuning methodology is used to accelerate the FFT computation and reduce communication costs, achieving linear scalability on large-scale systems. The importance of carefully tuning the algorithm is demonstrated for FFT-dependent applications.
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022)
(2022)