4.5 Article

PUMIPic: A mesh-based approach to unstructured mesh Particle-In-Cell on GPUs

Journal

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
Volume 157, Issue -, Pages 1-12

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jpdc.2021.06.004

Keywords

Particle-in-cell; Unstructured mesh; GPU

Funding

  1. National Science Foundation [ACI 1533581]
  2. U.S. Department of Energy, Office of Science [DE-AC5207NA27344, DE-SC0018275]
  3. Office of Science of the U.S. Department of Energy [DE-AC05-00OR22725]
  4. U.S. Department of Energy (DOE) [DE-SC0018275] Funding Source: U.S. Department of Energy (DOE)

Ask authors/readers for more resources

This paper introduces a framework for efficient and performance-portable mesh-based PIC simulations on GPU systems: PUMIPic. Performance evaluation of mesh-based PIC shows that it can utilize partitioned mesh and maintain scalability.
Unstructured mesh particle-in-cell, PIC, simulations executing on the current and next generation of massively parallel systems require new methods for both the mesh and particles to achieve performance and scalability on GPUs. The traditional approach to implementing PIC simulations defines data structures and algorithms in terms of particles with a full copy of the unstructured mesh on every process. To effectively scale the unstructured mesh and particles, mesh-based PIC uses the unstructured mesh as the predominant data structure with the particles stored in terms of the mesh entities. This paper details the PUMIPic library, a framework for developing efficient and performance-portable mesh-based PIC simulations on GPU systems. A pseudo physics simulation based on a five-dimensional gyro-kinetic code for modeling plasma physics is used to examine the performance of PUMIPic. Scaling studies of the unstructured mesh partition and number of particles are performed up to 4096 nodes of the Summit system at Oak Ridge National Laboratory. The studies show that mesh-based PIC can utilize a partitioned mesh and maintain scaling up to system limitations. (C) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Multidisciplinary Sciences

Effects of plasma turbulence on the nonlinear evolution of magnetic island in tokamak

Minjun J. Choi, Laszlo Bardoczi, Jae-Min Kwon, T. S. Hahm, Hyeon K. Park, Jayhyun Kim, Minho Woo, Byoung-Ho Park, Gunsu S. Yun, Eisung Yoon, George McKee

Summary: The authors provide comprehensive observations and analysis on the evolution of magnetic islands and plasma turbulence in tokamak plasmas, revealing the intricate effects of turbulence on the evolution of magnetic islands.

NATURE COMMUNICATIONS (2021)

Article Physics, Nuclear

Suppression of coherent synchrotron radiation induced emittance growth during electron-beam injection into plasma wakefields

S. -Y. Kim, M. Chung, S. Doebert, E. S. Yoon

Summary: Coherent synchrotron radiation is a collective effect that distorts the phase space of an electron beam, leading to emittance growth when the beam trajectory is bent in a dipole magnet. Suppressing CSR-induced emittance growth is essential for maintaining beam quality during electron beam transport. By controlling Twiss parameters and considering chromatic amplitude, the first-order terms can be dominant along the transfer line, minimizing emittance growth driven by the CSR effect. Failure to properly control CSR can result in significant increases in emittance, particularly during external injection into plasma wakefields.

PHYSICAL REVIEW ACCELERATORS AND BEAMS (2021)

Article Computer Science, Interdisciplinary Applications

A parallel interface tracking approach for evolving geometry problems

Fan Yang, Anirban Chandra, Yu Zhang, Saurabh Tendulkar, Rocco Nastasia, Assad A. Oberai, Mark S. Shephard, Onkar Sahni

Summary: This paper introduces a parallel interface tracking method for evolving geometry problems, using a conforming hybrid/mixed mesh structure with anisotropic layered elements, and employing a combination of mesh motion and modification to update the mesh while maintaining the structure and resolution of the layered elements; experimental results demonstrate the effectiveness of this approach in addressing problems with significant geometric motion or deformation.

ENGINEERING WITH COMPUTERS (2022)

Article Computer Science, Hardware & Architecture

Efficient exascale discretizations: High-order finite element methods

Tzanio Kolev, Paul Fischer, Misun Min, Jack Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S. Shephard, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Noel Chalmers, Yohann Dudouit, Ali Karakus, Ian Karlin, Stefan Kerkemeier, Yu-Hsiang Lan, David Medina, Elia Merzari, Aleksandr Obabko, Will Pazner, Thilina Rathnayake, Cameron W. Smith, Lukas Spies, Kasia Swirydowicz, Jeremy Thompson, Ananias Tomboulides, Vladimir Tomov

Summary: Efficient exploitation of exascale architectures requires new numerical algorithms. CEED, a research partnership focused on developing next-generation discretization software, collaborates with various projects and institutions to optimize performance on large-scale GPU architectures and advance algorithms in fields such as unstructured adaptive mesh refinement and high-order data visualization.

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS (2021)

Article Computer Science, Hardware & Architecture

The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science

Eric Suchyta, Scott Klasky, Norbert Podhorszki, Matthew Wolf, Abolaji Adesoji, C. S. Chang, Jong Choi, Philip E. Davis, Julien Dominski, Stephane Ethier, Ian Foster, Kai Germaschewski, Berk Geveci, Chris Harris, Kevin A. Huck, Qing Liu, Jeremy Logan, Kshitij Mehta, Gabriele Merlo, Shirley Moore, Todd Munson, Manish Parashar, David Pugmire, Mark S. Shephard, Cameron W. Smith, Pradeep Subedi, Lipeng Wan, Ruonan Wang, Shuangxi Zhang

Summary: EFFIS is a framework developed for high-fidelity coupled simulations, enabling users to easily compose and execute workflows with features like strong or weak coupling and in situ analysis. Key technologies utilized include ADIOS, PerfStubs/TAU, and an advanced COUPLER. Demonstrations show minimal overhead for the WDMApp workflow.

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS (2022)

Article Physics, Nuclear

Witness electron beam injection using an active plasma lens for a proton beam-driven plasma wakefield accelerator

S- Y. Kim, K. Moon, M. Chung, K. N. Sjobak, E. Adli, M. Dayyani, S. Doebert, E. S. Yoon, I. Nam, G. Hahn

Summary: An active plasma lens (APL) uses a magnetic field generated by a discharge current through the plasma to simultaneously focus the beam in both horizontal and vertical planes. Research shows that the plasma wakefield excited by proton bunches remains the same through the APL, while the emittance of the witness electron beam increases rapidly in the plasma density ramp regions. However, under certain conditions, the emittance growth is not significant for small emittances such as 2 mm mrad.

PHYSICAL REVIEW ACCELERATORS AND BEAMS (2021)

Article Computer Science, Interdisciplinary Applications

Adaptive workflow for simulation of RF heaters

Morteza H. Siboni, Mark S. Shephard

Summary: This paper presents a workflow for adaptive high-performance simulations of RF fusion systems, utilizing CAD models and high-order finite elements for analysis, and incorporating patch recovery-based error estimators for mesh adaptation.

COMPUTER PHYSICS COMMUNICATIONS (2022)

Article Computer Science, Interdisciplinary Applications

Development of a gyrokinetic hyperbolic solver based on discontinuous Galerkin method in tokamak geometry

Gahyung Jo, Jae-Min Kwon, Janghoon Seo, Eisung Yoon

Summary: A hyperbolic solver is developed for the gyrokinetic equation in tokamak geometry. The effects of basis functions on the numerical solutions and the conservation of physical quantities are investigated. The weighted basis functions show better performance in resolving small scale structures in velocity space.

COMPUTER PHYSICS COMMUNICATIONS (2022)

Article Computer Science, Interdisciplinary Applications

Nonlinear Fokker-Planck collision operator in Rosenbluth form for gyrokinetic simulations using discontinuous Galerkin method

Dongkyu Kim, Janghoon Seo, Gahyung Jo, Jae-Min Kwon, Eisung Yoon

Summary: A gyroaveraged nonlinear collision operator based on the Fokker-Planck operator in the Rosenbluth-MacDonald-Judd potential form is formulated and implemented for gyrokinetic simulations. The density conservation is ensured by preserving the divergence structure of the original RMJ form while neglecting the finite Larmor radius effect. Various collision models, including linear and Dougherty, are also incorporated to evaluate their advantages and disadvantages. The conservation of parallel momentum and energy is enforced numerically using a simple advection-diffusion model.

COMPUTER PHYSICS COMMUNICATIONS (2022)

Article Materials Science, Multidisciplinary

Progress in gyrokinetic validation studies using NBI heated L-mode discharge in KSTAR

D. Kim, J. Kang, M. W. Lee, J. Candy, E. S. Yoon, S. Yi, J. -m. Kwon, Y. -c. Ghim, W. Choe, C. Sung

Summary: Progress in the first gyrokinetic validation study using KSTAR NBI heated L-mode discharge is reported in this paper. The simulated energy flux was under-predicted compared to the experimental energy flux level, and sensitivity to input parameters related to impurity density profile was observed.

CURRENT APPLIED PHYSICS (2022)

Article Computer Science, Interdisciplinary Applications

SANTA: A safety analysis code for neutron absorbers in spent nuclear fuel pools

Geon Kim, Yunsong Jung, Myeongkyu Lee, Eisung Yoon, Sangjoon Ahn

Summary: Recent experimental reports highlight the need for quantitative evaluation of neutron-induced energetic particle emission reactions in neutron absorbers. In response, a Safety Analysis code for NeuTron Absorbers (SANTA) was developed to provide essential parameters for simulating the corrosion of absorbers. The code outputs radiation damage and helium concentration, which are crucial for designing irradiation experiments.

COMPUTER PHYSICS COMMUNICATIONS (2023)

Article Computer Science, Interdisciplinary Applications

Development of an unstructured mesh gyrokinetic particle-in-cell code for exascale fusion plasma simulations on GPUs

Chonglin Zhang, Gerrett Diamond, Cameron W. Smith, Mark S. Shephard

Summary: This paper presents XGCm, a new unstructured mesh gyrokinetic Particle-in-Cell (PIC) code for modeling fusion plasma. XGCm builds on an unstructured mesh-centric infrastructure that is scalable in both the number of mesh elements and particles, and supports generally graded or anisotropic meshes. The methods and algorithms used in the development of XGCm are discussed, which perform all key computing steps on GPU accelerators. Code validation and testing are performed, showing excellent agreement with existing results and demonstrating turbulence growth in different cases. Weak scaling results using the Oak Ridge National Laboratory's Summit supercomputer are also presented.

COMPUTER PHYSICS COMMUNICATIONS (2023)

Article Computer Science, Theory & Methods

MSHGN: Multi-scenario adaptive hierarchical spatial graph convolution network for GPU utilization prediction in heterogeneous GPU clusters

Sheng Wang, Shiping Chen, Fei Meng, Yumei Shi

Summary: This study proposes a Multi-Scenarios Adaptive Hierarchical Spatial Graph Convolution Network (MSHGN) model for accurately predicting GPU utilization rates in heterogeneous GPU clusters. By constructing multiple scenarios' undirected graphs and using Graph Convolution Neural (GCN) to capture spatial dependency relationships, the MSHGN model achieves superior accuracy and robustness in predicting resource utilization on a real-world Alibaba dataset.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

A parallel fractional explicit group modified AOR iterative method for solving fractional Poisson equation with multi-core architecture

Nik Amir Syafiq, Mohamed Othman, Norazak Senu, Fudziah Ismail, Nor Asilah Wati Abdul Hamid

Summary: This research investigates the multi-core architecture for solving the fractional Poisson equation using the modified accelerated overrelaxation (MAOR) scheme. The feasibility of the scheme in a parallel environment was tested through experimental comparisons and measurements. The results showed that the scheme is viable in a parallel environment.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Vampire: A smart energy meter for synchronous monitoring in a distributed computer system

Antonio F. Diaz, Beatriz Prieto, Juan Jose Escobar, Thomas Lampert

Summary: This paper presents the design and implementation of a low-cost energy monitoring system that synchronously collects the energy consumption of multiple devices using a specially designed wattmeter, and utilizes widely used technologies and tools in the Internet of Things for implementation.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Distributed runtime verification of metric temporal properties

Ritam Ganguly, Yingjie Xue, Aaron Jonckheere, Parker Ljung, Benjamin Schornstein, Borzoo Bonakdarpour, Maurice Herlihy

Summary: This paper presents a centralized runtime monitoring technique for distributed systems, which verifies the correctness of distributed computations by exploiting bounded-skew clock synchronization. By introducing a progression-based formula rewriting scheme and utilizing SMT solving techniques, the metric temporal logic can be monitored and the probabilistic guarantee for verification results can be calculated. Experimental results demonstrate the effectiveness of this technique in different application scenarios.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Eventually lattice-linear algorithms

Arya Tanmay Gupta, Sandeep S. Kulkarni

Summary: Lattice-linear systems allow nodes to execute asynchronously. The eventually lattice-linear algorithms introduced in this study guarantee system transitions to optimal states within specified moves, leading to improved performance compared to existing literature. Experimental results further support the benefits of lattice-linearity.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

ML-driven risk estimation for memory failure in a data center environment with convolutional neural networks, self-supervised data labeling and distribution-based model drift determination

Tim Breitenbach, Shrikanth Malavalli Divakar, Lauritz Rasbach, Patrick Jahnke

Summary: With the trend towards multi-socket server systems, the demand for RAM per server has increased, resulting in more DIMM sockets per server. RAM issues have become a dominant failure pattern for servers due to the probability of failure in each DIMM. This study introduces an ML-driven framework to estimate the probability of memory failure for each RAM module. The framework utilizes structural information between correctable (CE) and uncorrectable errors (UE) and engineering measures to mitigate the impact of UE.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Effectively computing high strength mixed covering arrays with constraints

Carlos Ansotegui, Eduard Torres

Summary: This paper presents an incomplete algorithm for efficiently constructing Covering Arrays with Constraints of high strength. The algorithm mitigates memory blow-ups and reduces run-time consumption, providing a practical tool for Combinatorial Testing.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Multi-resource scheduling of moldable workflows

Lucas Perotin, Sandhya Kandaswamy, Hongyang Sun, Padma Raghavan

Summary: Resource scheduling is crucial in High-Performance Computing systems, and previous research has mainly focused on a single type of resource. With advancements in hardware and the rise of data-intensive applications, considering multiple resources simultaneously is necessary to improve overall application performance. This study presents a Multi-Resource Scheduling Algorithm (MRSA) that minimizes the makespan of computational workflows by efficiently allocating resources and optimizing scheduling order. Simulation results demonstrate that MRSA outperforms baseline methods in various scenarios.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Accelerating block lifecycle on blockchain via hardware transactional memory

Yue Li, Han Liu, Jianbo Gao, Jiashuo Zhang, Zhi Guan, Zhong Chen

Summary: The processing of block lifecycles is crucial to the efficiency of a blockchain. The FASTBLOCK framework, which introduces fine-grained concurrency, accelerates the execution and validation steps. It outperforms state-of-the-art solutions significantly in terms of performance.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

An evaluation of GPU filters for accelerating the 2D convex hull

Roberto Carrasco, Hector Ferrada, Cristobal A. Navarro, Nancy Hitschfeld

Summary: The experimental evaluation of GPU filters for computing the 2D convex hull shows significant performance improvement. The different point distributions have a noticeable impact on the results, with the greatest improvement seen in the case of uniform and normal distributions.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Scheduling independent tasks on multiple cloud-assisted edge servers with energy constraint

Keqin Li

Summary: In this paper, the authors study task scheduling with or without energy constraint in mobile edge computing. They propose heuristic algorithms to solve these problems and analyze them using the methods of communication unification, effective speed concept, and virtual task construction. The experimental results show that the performance of the heuristic algorithms is close to the optimal algorithm. This is the first paper in the literature to optimize the makespan of task scheduling with or without energy constraint in mobile edge computing with multiple cloud-assisted edge servers.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Interference-aware opportunistic job placement for shared distributed deep learning clusters

Hongliang Li, Hairui Zhao, Ting Sun, Xiang Li, Haixiao Xu, Keqin Li

Summary: This paper studies the problem of job placement in shared GPU clusters and proposes an opportunistic memory sharing model and algorithms to solve the problem. Extensive experiments on a GPU cluster validate the correctness and effectiveness of the proposed approach.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Scalable atomic broadcast: A leaderless hierarchical algorithm

Lucas Ruchel, Edson Tavares de Camargo, Luiz Antonio Rodrigues, Rogerio C. Turchetti, Luciana Arantes, Elias Procopio Duarte Jr.

Summary: LHABcast is a leaderless hierarchical atomic broadcast algorithm that improves scalability by being fully decentralized and hierarchical. It uses local sequence numbers and timestamps to order messages and achieves significantly lower message count compared to an all-to-all strategy, both in fault-free and faulty scenarios.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Redactable consortium blockchain based on verifiable distributed chameleon hash functions

Xiangyu Wu, Xuehui Du, Qiantao Yang, Na Wang, Wenjuan Wang

Summary: This paper proposes a new method to address the immutability issue of consortium blockchains by introducing a verifiable distributed chameleon hash (VDCH) function and a consensus protocol called CVTSS based on verifiable threshold signatures. The proposed method enhances the flexibility, fault tolerance, and redaction efficiency of consortium blockchains.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)

Article Computer Science, Theory & Methods

Task scheduling optimization in heterogeneous cloud computing environments: A hybrid GA-GWO approach

Ipsita Behera, Srichandan Sobhanayak

Summary: Task scheduling in cloud computing is a challenging problem, and researchers propose a hybrid algorithm that aims to minimize makespan, energy consumption, and cost. Evaluation using the Cloudsim toolkit demonstrates the algorithm's effectiveness and efficiency.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2024)