4.7 Article

Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA

期刊

COMPUTER PHYSICS COMMUNICATIONS
卷 179, 期 9, 页码 634-641

出版社

ELSEVIER
DOI: 10.1016/j.cpc.2008.05.008

关键词

Graphics processing unit; Molecular dynamics; Advanced computer architecture

向作者/读者索取更多资源

Molecular dynamics is an important computational tool to simulate and understand biochemical processes at the atomic level. However, accurate simulation of processes such as protein folding requires a large number of both atoms and time steps. This in turn leads to huge runtime requirements. Hence, finding fast solutions is of highest importance to research. In this paper we present a new approach to accelerate molecular dynamics simulations with inexpensive commodity graphics hardware. To derive an efficient mapping onto this type of computer architecture, we have used the new Compute Unified Device Architecture programming interface to implement a new parallel algorithm. Our experimental results show that the graphics card based approach allows speedups of up to factor nineteen compared to the corresponding sequential implementation. (C) 2008 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Theory & Methods

Automatic Generation of High-Performance Convolution Kernels on ARM CPUs for Deep Learning

Jintao Meng, Chen Zhuang, Peng Chen, Mohamed Wahib, Bertil Schmidt, Xiao Wang, Haidong Lan, Dou Wu, Minwen Deng, Yanjie Wei, Shengzhong Feng

Summary: FastConv is a template-based code auto-generation open-source library that generates high-performance deep learning convolution kernels for arbitrary matrices/tensors shapes. It addresses the optimization challenge for convolution layers of different shapes and achieves performance portability by automatically selecting the best combination of kernel shapes, cache tiles, loop orders, packing strategies, access patterns, and computations. FastConv outperforms NNPACK, ARM NN, and FeatherCNN on Kunpeng 920 CPU, with speedups ranging from 1.02x to 2.48x. It also demonstrates performance portability on various convolution shapes and achieves significant speedups over NNPACK and ARM NN using Winograd on Kunpeng 920, as well as other CPU architectures such as Snapdragon, Apple M1, and AWS Graviton2.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Computer Science, Theory & Methods

Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions

Ping Gao, Xiaohui Duan, Bertil Schmidt, Wusheng Zhang, Lin Gan, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang

Summary: Molecular dynamics simulations have become increasingly important in various fields. By optimizing the computation of interactions, we achieved significantly faster simulations and proposed a method to eliminate write conflicts, resulting in a significant speedup. Compared to other software packages, our implementation allows for simulations of a large number of atoms on a large-scale cluster with high efficiency.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Computer Science, Theory & Methods

FMapper: Scalable read mapper based on succinct hash index on SunWay TaihuLight

Kai Xu, Xiaohui Duan, Andre Muller, Robin Kobus, Bertil Schmidt, Weiguo Liu

Summary: This paper introduces FMapper, a highly scalable read mapper optimized for the SW26010 many-core architecture on the TaihuLight supercomputer. By implementing dynamic task scheduling, asynchronous I/O and data transfers, and a vectorized version of the banded Myers algorithm tailored to the SW26010 256 bit vector registers, FMapper outperforms other read mappers in performance evaluation.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2022)

Article Computer Science, Theory & Methods

General-purpose GPU hashing data structures and their application in accelerated genomics

Daniel Juenger, Robin Kobus, Andre Mueller, Christian Hundt, Kai Xu, Weiguo Liu, Bertil Schmidt

Summary: Hash maps are versatile data structures widely used in data analytics and artificial intelligence. The WarpCore framework aims to improve both versatility and performance, providing acceleration for bioinformatics applications.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2022)

Article Computer Science, Interdisciplinary Applications

RNACache: A scalable approach to rapid transcriptomic read mapping using locality sensitive hashing

Julian Cascitti, Stefan Niebler, Andre Mueller, Bertil Schmidt

Summary: RNACache is a novel approach based on context-aware locality sensitive hashing for detecting local similarities between transcriptomes and RNA-seq reads. It consists of a three-step processing pipeline that accurately identifies truly expressed transcript isoforms and offers better performance and scalability compared to other lightweight mapping tools.

JOURNAL OF COMPUTATIONAL SCIENCE (2022)

Article Biochemical Research Methods

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data

Konstantin Bob, David Teschner, Thomas Kemmer, David Gomez-Zepeda, Stefan Tenzer, Bertil Schmidt, Andreas Hildebrandt

Summary: This study demonstrates the effectiveness of locality-sensitive hashing in signal classification in mass spectrometry raw data, achieving superior performance by balancing false-positive and false-negative rates through appropriate algorithm parameters. This approach significantly reduces data size while preserving important information in processing large-scale mass spectrometry data.

BMC BIOINFORMATICS (2022)

Article Biochemical Research Methods

CARE 2.0: reducing false-positive sequencing error corrections using machine learning

Felix Kallenborn, Julian Cascitti, Bertil Schmidt

Summary: This article presents CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment for Illumina datasets. With the use of new optimizations and a classifier based on random decision forests, CARE 2.0 reduces false-positive corrections significantly and achieves high numbers of true-positive corrections. The results demonstrate the applicability of CARE 2.0 in improving k-mer analysis and de novo assembly with real-world data.

BMC BIOINFORMATICS (2022)

Article Computer Science, Theory & Methods

Redesigning and Optimizing UCSF DOCK3.7 on Sunway TaihuLight

Kai Xu, Jinxiao Zhang, Xiaohui Duan, Xiaobo Wan, Niu Huang, Bertil Schmidt, Weiguo Liu, Guangwen Yang

Summary: This paper presents the porting and optimization of UCSF DOCK3.7 on the Sunway TaihuLight supercomputer. Several strategies, such as the producer-consumer strategy, a new binary file format, and ligand orientation scoring optimization, are employed to improve the performance and efficiency of molecular docking.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Computer Science, Theory & Methods

Bio-ESMD: A Data Centric Implementation for Large-Scale Biological System Simulation on Sunway TaihuLight Supercomputer

Xiaohui Duan, Qi Shao, Junben Weng, Bertil Schmidt, Lin Gan, Guohui Li, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang

Summary: In this paper, a new MD implementation named Bio-ESMD is presented, which improves computational efficiency by reorganizing the cell list data structure to adopt bond lists with guaranteed data locality. Compared to SW_GROMACS, the implementation achieves speedups of over two on Sunway TaihuLight and exhibits linear weak scaling efficiency, achieving simulation of systems with 308.8 million atoms at 1.33 ns/day or 14.44 million atoms at 17.28 ns/day.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2023)

Article Biochemical Research Methods

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms

Hao Zhang, Honglei Song, Xiaoming Xu, Qixin Chang, Mingkai Wang, Yanjie Wei, Zekun Yin, Bertil Schmidt, Weiguo Liu

Summary: The continuous growth of generated sequencing data has resulted in the development of bioinformatics tools. However, many of these tools are restricted by slow execution times due to parsing files. This motivates the design of RabbitFX, a framework that efficiently parses sequencing data on modern multi-core systems. It provides optimized formatting implementation and user-friendly APIs that can integrate into applications to increase file parsing speed. Integration of RabbitFX into three I/O-intensive applications shows significant speedups compared to the original versions. RabbitFX is open-source software available at https://github.com/RabbitBio/RabbitFX.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Article Computer Science, Interdisciplinary Applications

CUDA-accelerated protein electrostatics in linear space

Thomas Kemmer, Sebastian Hack, Bertil Schmidt, Andreas Hildebrandt

Summary: Protein interactions are crucial for understanding biological function. This study presents an implicit, yet exact representation for dense and asymmetric system matrices of boundary element methods (BEM) for nonlocal protein electrostatics, allowing for the analysis of protein surface meshes with large numbers of elements in memory-limited environments.

JOURNAL OF COMPUTATIONAL SCIENCE (2023)

Article Computer Science, Information Systems

DeepFilter: A Deep Learning Based Variant Filter for VarDict

Hao Zhang, Zekun Yin, Yanjie Wei, Bertil Schmidt, Weiguo Liu

Summary: With the development of sequencing technologies, somatic mutation analysis has become important in cancer research and treatment. VarDict is commonly used for this task, but it may detect false positive variants. To address this problem, we propose DeepFilter, a deep-learning based filter for VarDict, which can effectively filter out false positive variants.

TSINGHUA SCIENCE AND TECHNOLOGY (2023)

Article Genetics & Heredity

MetaTransformer: deep metagenomic sequencing read classification using self-attention models

Alexander Wichmann, Etienne Buschong, Andre Mueller, Daniel Juenger, Andreas Hildebrandt, Thomas Hankeln, Bertil Schmidt

Summary: Deep learning has had a significant impact on scientific research and this paper introduces a self-attention-based deep learning tool called MetaTransformer for metagenomic analysis. MetaTransformer outperforms previous methods in species and genus classification and achieves improved performance and reduced memory consumption through different embedding schemes.

NAR GENOMICS AND BIOINFORMATICS (2023)

Article Computer Science, Theory & Methods

Redesign and Accelerate the AIREBO Bond-Order Potential on the New Sunway Supercomputer

Ping Gao, Xiaohui Duan, Bertil Schmidt, Wubing Wan, Jiaxu Guo, Wusheng Zhang, Lin Gan, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang

Summary: This article introduces the method of simulating carbon and hydrocarbon systems using the AIREBO potential in LAMMPS on the new Sunway supercomputer. By implementing parallel two-level building scheme, periodic buffering strategy, and optimized nearest-neighbor access algorithms, efficient simulation and computation are achieved.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2023)

Proceedings Paper Computer Science, Hardware & Architecture

Online Event Selection for Mu3e using GPUs

Valentin Henkys, Bertil Schmidt, Niklaus Berger

Summary: The Mu3e experiment aims to observe physics beyond the Standard Model by observing the decay products of high-density muons. An online event selection algorithm is used to reduce the data rate by using simple geometric selection and reconstruction methods, achieving the targeted performance requirements.

2022 21ST INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2022) (2022)

Article Computer Science, Interdisciplinary Applications

Modeling and meshing for tokamak edge plasma simulations

Usman Riaz, E. Seegyoung Seol, Robert Hager, Mark S. Shephard

Summary: The accurate representation and effective discretization of a problem domain into a mesh are crucial for achieving high-quality simulation results and computational efficiency. This work presents recent developments in extending an automated tokamak modeling and meshing infrastructure to better support the near flux field following meshing requirements of the XGC Gyro-kinetic Code.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

Electron-phonon coupling from GW perturbation theory: Practical workflow combining BerkeleyGW, ABINIT, and EPW

Zhenglu Li, Gabriel Antonius, Yang-Hao Chan, Steven G. Louie

Summary: This article presents a workflow for practical calculations of electron-phonon coupling and includes the effect of many-electron correlations using GW perturbation theory. The workflow combines different software packages to enable accurate calculations at the level of quasiparticle band structures.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

MASAP: A package for atomic scattering amplitude in solids

Akihiro Koide, Sara Rabouli, Pierre Le Meur, Sylvain Tricot, Philippe Schieffer, Didier Sebilleau, Calogero R. Natoli

Summary: We present the MsSpec Atomic Scattering Amplitude Package (MASAP), which includes a computation program and a graphical interface for generating atomic scattering amplitude (ASA). The study investigates the applicability of plane wave (PW) and curved spherical wave (SW) scattering in describing electron propagation. The results show that the imaginary part of the optical potential enhances the elastic scattering in the forward direction but causes damping effects in other directions.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

A Bi-directional method for evaluating integrals involving higher transcendental functions. HyperRAF: A Julia package for new hyper-radial functions

A. Bagci, Gustavo A. Aucar

Summary: The electron repulsion integrals over Slater-type orbitals with non-integer principal quantum numbers are investigated in this study. These integrals are important in calculations of many-electron systems. New relationships free from hyper-geometric functions are derived to simplify the calculations. With the use of auxiliary functions and straightforward recurrence relationships, these integrals can be efficiently computed, providing initial conditions for the evaluation of expectation values and potentials.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

First-principles calculations of specular reflection of high-energy electrons during the two-dimensional crystal growth

Andrzej Daniluk

Summary: RHEED_DIFF_2D is an open-source software for qualitative numerical simulations of RHEED oscillation intensity changes with layer deposition, used for interpreting heteroepitaxial structures under different scattering crystal potential models.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

An incremental singular value decomposition approach for large-scale spatially parallel & distributed but temporally serial data - applied to technical flows ☆

Niklas Kuehl, Hendrik Fischer, Michael Hinze, Thomas Rung

Summary: The article presents a strategy and algorithm for simulation-accompanying, incremental Singular Value Decomposition (SVD) for time-evolving, spatially parallel discrete data sets. The proposed method improves computational efficiency by introducing a bunch matrix, resulting in higher accuracy and practical applicability.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

TRAVOLTA: GPU acceleration and algorithmic improvements for constructing quantum optimal control fields in photo-excited systems

Jose M. Rodriguez-Borbon, Xian Wang, Adrian P. Dieguez, Khaled Z. Ibrahim, Bryan M. Wong

Summary: This paper presents an open-source software package called TRAVOLTA for massively parallelized quantum optimal control calculations on GPUs. The TRAVOLTA package is an improvement on the previous NIC-CAGE algorithm and incorporates algorithmic improvements for faster convergence. Three different variants of GPU parallelization are examined to evaluate their performance in constructing optimal control fields in various quantum systems. The benchmarks show that the GPU-enhanced TRAVOLTA code produces the same results as previous CPU-based algorithms but with a speedup of more than ten times. The GPU enhancements and algorithmic improvements allow large quantum optimal control calculations to be efficiently executed on modern multi-core computational hardware.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

MCNOX: A code for computing and interpreting ultrafast nonlinear X-ray spectra of molecules at the multiconfigurational level

Weijie Hua

Summary: This work introduces a program called MCNOX for computing and analyzing ultrafast nonlinear X-ray spectra. It is designed for cutting-edge applications in photochemistry/photophysics enabled by X-ray free-electron lasers and high harmonic generation light sources. The program can calculate steady-state X-ray absorption spectroscopy and three types of ultrafast nonlinear X-ray spectra, and it is capable of identifying major electronic transitions and providing physical and chemical insights from complex signals.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

PLQ-sim: A computational tool for simulating photoluminescence quenching dynamics in organic donor/acceptor blends

Leandro Benatto, Omar Mesquita, Lucimara S. Roman, Rodrigo B. Capaz, Graziani Candiotto, Marlus Koehler

Summary: Photoluminescence Quenching Simulator (PLQ-Sim) is a user-friendly software for studying the dynamics of photoexcited states at the interface between organic semiconductors. It provides important information on organic photovoltaic and photothermal devices and calculates transfer rates and quenching efficiency.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

A method of calculating bandstructure in real-space with application to all-electron and full potential

Dongming Li, James Kestyn, Eric Polizzi

Summary: This study introduces a practical and efficient approach to calculate the all-electron full potential band structure in real space using a finite element basis. Instead of the k-space method, this method solves the Kohn-Sham equation self-consistently within a larger finite system enclosing the unit-cell. Non-self-consistent calculations are then performed in the Brillouin zone to obtain the band structure results, which are found to be in excellent agreement with the pseudopotential k-space method. Furthermore, the study successfully observes the band bending of core electrons.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

EUTERPE: A global gyrokinetic code for stellarator geometry

R. Kleiber, M. Borchardt, R. Hatzky, A. Koenies, H. Leyh, A. Mishchenko, J. Riemann, C. Slaby, J. M. Garcia-Regana, E. Sanchez, M. Cole

Summary: This paper describes the current state of the EUTERPE code, focusing on the implemented models and their numerical implementation. The code is capable of solving the multi-species electromagnetic gyrokinetic equations in a three-dimensional domain. It utilizes noise reduction techniques and grid resolution transformation for efficient computation. Additionally, various hybrid models are implemented for comparison and the study of plasma-particle interactions. The code is parallelized for high scalability on multiple CPUs.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

SMIwiz: An integrated toolbox for multidimensional seismic modelling and imaging

Pengliang Yang

Summary: This paper presents an open source software called SMIwiz, which combines seismic modelling, reverse time migration, and full waveform inversion into a unified computer implementation. SMIwiz supports both 2D and 3D simulations and provides various computational recipes for efficient calculation. Its independent processing and batchwise job scheduling ensure scalability, and its viability is demonstrated through applications on benchmark models.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

Generating and grading 34 optimised norm-conserving Vanderbilt pseudopotentials for actinides and super-heavy elements in the PseudoDojo

Christian Tantardini, Miroslav Ilias, Matteo Giantomassi, Alexander G. Kvashnin, Valeria Pershina, Xavier Gonze

Summary: Material discovery has been an active research field, and this study focuses on developing pseudopotentials for actinides and super-heavy elements. These pseudopotentials are crucial for accurate first-principles calculations and simulations.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

Generalisation of splitting methods based on modified potentials to nonlinear evolution equations of parabolic and Schrödinger type

S. Blanes, F. Casas, C. Gonzalez, M. Thalhammer

Summary: This paper explores the extension of modified potential operator splitting methods to specific classes of nonlinear evolution equations. Numerical experiments confirm the advantages of the proposed fourth-order modified operator splitting method over traditional splitting methods in dealing with Gross-Pitaevskii systems.

COMPUTER PHYSICS COMMUNICATIONS (2024)

Article Computer Science, Interdisciplinary Applications

Pole-fitting for complex functions: Enhancing standard techniques by artificial-neural-network classifiers and regressors *

Siegfried Kaidisch, Thomas U. Hilger, Andreas Krassnigg, Wolfgang Lucha

Summary: Motivated by a use case in theoretical hadron physics, this paper revisits an application of a pole-sum fit to dressing functions of a confined quark propagator. Specifically, it investigates approaches to determine the number and positions of singularities closest to the origin for a function known numerically on a specific grid on the positive real axis. Comparing the efficiency of standard techniques to a pure artificial-neural-network approach and a combination of both, it finds that the combined approach is more efficient. This approach can be applied to similar situations where the positions of poles need to be estimated quickly and reliably from real-axis information alone.

COMPUTER PHYSICS COMMUNICATIONS (2024)