Article
Computer Science, Hardware & Architecture
Victor M. Garcia-Molla, Pedro Alonso-Jorda, Ricardo Garcia-Laguia
Summary: This paper proposes a parallel version of the Suzuki algorithm designed for GPUs, which splits the image into small rectangles and tracks the borders with a dedicated thread for each rectangle. The experimental results show that the proposed parallel algorithm is more than 10 times faster than the sequential CPU routine, using the available GPUs and CPUs.
JOURNAL OF SUPERCOMPUTING
(2022)
Article
Mathematics, Applied
M. Dessole, F. Marcuzzi
Summary: The paper presents the PARASOF algorithm for solving linear systems with BABD matrices on massively parallel computing systems, comparing it with the state-of-the-art SOF algorithm in terms of stability. It discusses the design, implementation, and theoretical and experimental performances of PARASOF.
NUMERICAL ALGORITHMS
(2021)
Article
Environmental Sciences
Oscar Ferraz, Vitor Silva, Gabriel Falcao
Summary: The study investigated the parallel solution on embedded systems, reducing development effort and power consumption, utilizing a low-power GPU for image prediction, and exploiting multiple CPU cores and GPU for image entropy encoding in parallel.
Article
Computer Science, Software Engineering
Marcelo de Matos Menezes, Salles Viana Gomes de Magalhaes, Matheus Aguilar de Oliveira, W. Randolph Franklin, Rodrigo Eduardo de Oliveira Bauer Chichorro
Summary: This paper presents an algorithm that accelerates the evaluation of a large number of 3D geometric predicates by utilizing the strengths of both CPU and GPU. The algorithm progressively eliminates non-intersecting pairs and identifies the actual intersections. It achieves significant parallel speedup and can efficiently find a large number of intersections in a short time.
COMPUTER-AIDED DESIGN
(2022)
Article
Computer Science, Theory & Methods
Andrea Formisano, Raffaella Gentilini, Flavio Vella
Summary: This article discusses the importance of modeling the consumption of limited resources in embedded controllers, as well as the challenges in solving certain game instances. Research shows that sequential implementations and CPU multi-core, GPU parallelism are limited in efficiency in solving these problems. By optimizing algorithms and introducing new parallel implementations, the time to solution can be significantly reduced.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Computer Science, Software Engineering
Yuxuan Hou, Zhong Ren, Qiming Hou, Yubo Tao, Yankai Jiang, Wei Chen
Summary: Neuron tracing, or reconstruction, is crucial for studying neuronal circuits and brain mechanisms. We present InstantTrace, a novel framework that utilizes parallel tracing on GPUs and achieves a more than 20x speed boost compared to state-of-the-art methods, while maintaining comparable reconstruction quality. The framework takes advantage of the sparse feature and tree structure of the neuron image and parallelizes all stages of the tracing pipeline on GPU. A test on a whole mouse brain OM Image demonstrated that our framework can achieve a preliminary reconstruction within 1 hour on a single GPU, which is an order of magnitude faster than existing methods. This framework has the potential to significantly improve the efficiency of the neuron tracing process and provide instant preliminary results for manual verification and refinement.
Article
Computer Science, Interdisciplinary Applications
Hadi Zolfaghari, Dominik Obrist
Summary: The paper introduces a more algebraically simpler yet more advanced parallel implementation for solving the Poisson problem on a large number of distributed GPUs. The combination of data parallelism and task parallelism reduces communication overhead, leading to a significant decrease in time-to-solution and computational cost for the Poisson problem.
JOURNAL OF COMPUTATIONAL PHYSICS
(2021)
Article
Computer Science, Theory & Methods
Feng Zhang, Zheng Chen, Chenyang Zhang, Amelie Chi Zhou, Jidong Zhai, Xiaoyong Du
Summary: This article introduces a GPU-based framework named ParSecureML to improve the performance of secure machine learning algorithms based on two-party computation. ParSecureML solves challenges including complex computation patterns, frequent data transmission between CPU and GPU, and inter-node data dependence. Compared to state-of-the-art frameworks, ParSecureML achieves an average speedup of 33.8x.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Physics, Multidisciplinary
Takuya Okuyama, Andre Rohm, Takatomo Mihana, Makoto Naruse
Summary: Matrix multiplication is important for various applications, and reducing computation time is crucial. Despite the potential of GPUs, research has not focused on accelerating AMMs for general matrices. In this paper, we propose a method to improve Monte Carlo AMMs, with optimal values for hyperparameters. The proposed method enhances matrix product approximation without increasing computation time, and is compatible with parallel operations on GPUs, demonstrating halved computation time compared to the conventional power method.
Review
Biochemical Research Methods
You Zou, Yuejie Zhu, Yaohang Li, Fang-Xiang Wu, Jianxin Wang
Summary: The rapid increase of genome data from gene sequencing technologies presents a massive challenge to data processing. To address this, researchers have proposed methods in big data storage, efficient algorithm design, and parallel computing. This review investigates popular parallel programming technologies for genome sequence processing, discussing models, applications, and limitations.
BRIEFINGS IN BIOINFORMATICS
(2021)
Article
Computer Science, Theory & Methods
Dina G. Mahmoud, Vincent Lenders, Mirjana Stojilovic
Summary: This article investigates electrical-level attacks on CPUs, FPGAs, and GPUs, and explores their impact on heterogeneous systems. Additionally, it highlights open research directions for ensuring the security of heterogeneous computing systems in the future.
ACM COMPUTING SURVEYS
(2023)
Article
Computer Science, Hardware & Architecture
Yuzhu Wang, Mingxin Guo, Yuan Zhao, Jinrong Jiang
Summary: This paper presents an approach to running a large-scale, computationally intensive, longwave radiative transfer model on a GPU cluster. A CUDA-based acceleration algorithm for the RRTMG longwave radiation scheme on multiple GPUs is proposed, and a heterogeneous, hybrid programming paradigm (MPI+CUDA) is utilized with the RRTMG_LW on a GPU cluster. Experimental results show that the multi-GPU acceleration algorithm is valid, scalable, and highly efficient, achieving a 77.78x speedup when compared to a single Intel Xeon E5-2680 CPU core.
JOURNAL OF SUPERCOMPUTING
(2021)
Review
Energy & Fuels
Ahmed Al-Shafei, Hamidreza Zareipour, Yankai Cao
Summary: The transition towards net-zero emissions is inevitable for humanity's future. Electrical energy systems emit the most emissions among all sectors. This requires a transition towards an emission-free smart grid, which involves integrating wind and solar-powered resources and adopting new paradigms such as distributed resources and IoT technologies. However, these changes will pose unprecedented challenges in terms of scale, complexity, and data, making it important to consider high performance computing, parallel computing, and cloud computing in future electrical energy studies.
Article
Computer Science, Hardware & Architecture
J. J. Moreno, J. Miroforidis, E. Filatovas, I. Kaliszewski, E. M. Garzon
Summary: This work reports on the authors' attempt to put radiotherapy planning in a 'win-win' situation by exploring unexploited reserves in optimization methods and algorithms, as well as utilizing High Performance Computing resources. By incorporating sparse matrix procedures into optimization algorithms and leveraging graphical processing units, they were able to achieve speedups in optimization computations, as demonstrated in numerical testing on a clinical case.
JOURNAL OF SUPERCOMPUTING
(2021)
Review
Computer Science, Theory & Methods
Andre Luis Barroso Almeida, Joubert de Castro Lima, Marco Antonio M. Carvalho
Summary: In the past 35 years, parallel computing has gained increasing interest in the academic community, particularly in addressing complex optimization problems. This survey focuses on the use of high-performance computing techniques to design and implement trajectory-based metaheuristics. It provides a comprehensive overview of the current state-of-the-art in multi-core and distributed trajectory-based metaheuristics, introducing basic concepts of high-performance computing and reviewing different taxonomies for architectures and metaheuristics. The survey also presents a summary and classification of 127 publications, identifies research gaps, and discusses past and future trends.
ACM COMPUTING SURVEYS
(2023)