4.6 Article

SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs

期刊

APPLIED SCIENCES-BASEL
卷 9, 期 5, 页码 -

出版社

MDPI
DOI: 10.3390/app9050947

关键词

sparse matrix-vector multiplication (SpMV); high performance computing (HPC); graphics processing units; general-purpose computing on graphics processing units (GPGPUs); iterative methods; data analysis; sparse matrix storage; load balancing; coalesced memory access; thread divergence; Freedman-Diaconis rule

资金

  1. Deanship of Scientific Research (DSR) at the King Abdulaziz University (KAU), Jeddah, Saudi Arabia [G-651-611-38]

向作者/读者索取更多资源

Sparse matrix-vector (SpMV) multiplication is a vital building block for numerous scientific and engineering applications. This paper proposes SURAA (translates to speed in arabic), a novel method for SpMV computations on graphics processing units (GPUs).The novelty lies in the way we group matrix rows into different segments, and adaptively schedule various segments to different types of kernels. The sparse matrix data structure is created by sorting the rows of the matrix on the basis of the nonzero elements per row (npr) and forming segments of equal size (containing approximately an equal number of nonzero elements per row) using the Freedman-Diaconis rule. The segments are assembled into three groups based on the mean npr of the segments. For each group, we use multiple kernels to execute the group segments on different streams. Hence, the number of threads to execute each segment is adaptively chosen. Dynamic Parallelism available in Nvidia GPUs is utilized to execute the group containing segments with the largest mean npr, providing improved load balancing and coalesced memory access, and hence more efficient SpMV computations on GPUs. Therefore, SURAA minimizes the adverse effects of the npr variance by uniformly distributing the load using equal sized segments. We implement the SURAA method as a tool and compare its performance with the de facto best commercial (cuSPARSE) and open source (CUSP, MAGMA) tools using widely used benchmarks comprising 26 high npr variance matrices from 13 diverse domains. SURAA outperforms the other tools by delivering 13.99x speedup on average. We believe that our approach provides a fundamental shift in addressing SpMV related challenges on GPUs including coalesced memory access, thread divergence, and load balancing, and is set to open new avenues for further improving SpMV performance in the future.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Engineering, Manufacturing

Optimal Cyclic Scheduling of Wafer-Residency-Time-Constrained Dual-Arm Cluster Tools by Configuring Processing Modules and Robot Waiting Time

Jufeng Wang, Chunfeng Liu, MengChu Zhou, Tingting Leng, Aiiad Albeshri

Summary: This study selects a proper number of required types of processing modules (PMs) to process wafers, ensuring the highest productivity of a wafer-residency-time-constrained dual-arm cluster tool. It proposes the necessary and sufficient conditions for tool schedulability and develops a polynomial-complexity algorithm for finding an optimal cyclic schedule. Examples are provided to demonstrate its superiority over existing approaches, advancing the field of cluster tool scheduling and promoting green manufacturing of wafers for semiconductor producers.

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING (2023)

Article Computer Science, Information Systems

Distributed access control for information-centric networking architectures using verifiable credentials

Bander Alzahrani, Nikos Fotiou, Aiiad Albeshri, Abdullah Almuhaimeed, Khalid Alsubhi

Summary: ICN is an emerging paradigm that enables secure retrieval of content items independent of their location. This paper proposes a solution that allows third-party storage nodes to verify user authorization for accessing specific content items, leveraging Verifiable Credentials to build trust chains and express user capabilities. The solution enables users to prove authorization using a single message integrated into a content request and eliminates the need for verifying entities to store any secrets. It also supports lightweight delegation.

INTERNATIONAL JOURNAL OF INFORMATION SECURITY (2023)

Article Automation & Control Systems

Discriminative Manifold Distribution Alignment for Domain Adaptation

SiYa Yao, Qi Kang, MengChu Zhou, Muhyaddin J. Rawa, Aiiad Albeshri

Summary: This article proposes an efficient discriminative manifold distribution alignment (DMDA) approach, which improves feature transferability by aligning both global and local distributions and refines a discriminative model by learning geometrical structures in manifold space. Extensive experiments show that DMDA outperforms other methods in both classification accuracy and time efficiency in domain adaptation tasks.

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2023)

Article Chemistry, Multidisciplinary

A Knowledge Sharing and Individually Guided Evolutionary Algorithm for Multi-Task Optimization Problems

Xiaoling Wang, Qi Kang, Mengchu Zhou, Zheng Fan, Aiiad Albeshri

Summary: Multi-task optimization (MTO) is a new evolutionary computation paradigm that solves multiple optimization tasks concurrently by utilizing task similarities and historical knowledge. This work proposes the individually guided multi-task optimization (IMTO) framework, which explores each individual to learn from other tasks, selects individuals with higher solving ability, and only inferior individuals learn from other tasks to improve knowledge transfer. The advantage of IMTO over multifactorial evolutionary framework and baseline solvers is verified through benchmark studies.

APPLIED SCIENCES-BASEL (2023)

Article Computer Science, Information Systems

Knowledge Sharing in AI Services: A Market-Based Approach

Thaha Mohammed, Si-Ahmed Naas, Stephan Sigg, Mario Di Francesco

Summary: Today's DNNs are accurate but require large amount of data. This work proposes a knowledge sharing method by exchanging weights of pretrained DNNs and using transfer learning. It utilizes a market-based approach for optimal knowledge sharing and introduces a weight fusion technique. Evaluation shows that the proposed solution is efficient and significantly improves inference accuracy without the need of federated learning.

IEEE INTERNET OF THINGS JOURNAL (2023)

Article Computer Science, Information Systems

Distributed Assignment With Load Balancing for DNN Inference at the Edge

Yuzhe Xu, Thaha Mohammed, Mario Di Francesco, Carlo Fischione

Summary: This article addresses the problem of DNN inference allocation in edge computing, proposing a realistic DNN inference model and a distributed algorithm to solve it. Experimental results show that the proposed solution significantly outperforms existing techniques in terms of inference time, load balance, and convergence speed.

IEEE INTERNET OF THINGS JOURNAL (2023)

Review Computer Science, Information Systems

Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System Architecture

Sardar Usman, Rashid Mehmood, Iyad Katib, Aiiad Albeshri

Summary: Big data has transformed science and technology, bringing about societal changes. High-performance computing (HPC) supports big data analysis using AI and methods. Efforts have been made to combine HPC and big data into converged architectures for improved performance and resource efficiency.

ELECTRONICS (2023)

Article Environmental Sciences

Psychological Health and Drugs: Data-Driven Discovery of Causes, Treatments, Effects, and Abuses

Sarah Alswedani, Rashid Mehmood, Iyad Katib, Saleh M. Altowaijri

Summary: Mental health issues have significant impacts and addressing the root causes is crucial for prevention and sustainability. A holistic approach is needed to understand mental health in the context of social and environmental factors. More research, awareness, and interventions are necessary to address these issues, including studying the effectiveness and risks of medications.

TOXICS (2023)

Article Mathematics, Interdisciplinary Applications

Nonfragile observer-based event-triggered fuzzy tracking control for fast-sampling singularly perturbed systems with dual-layer switching mechanism and cyber-attacks

Fang Guo, Mengzhuo Luo, Jun Cheng, Iyad Katib, Kaibo Shi

Summary: This paper investigates the problem of nonfragile observer-based tracking control for a class of fuzzy fast sampling singularly perturbed systems with sensor saturation, event-triggered scheme, and random cyber-attacks. The proposed control protocol improves design flexibility and reduces conservativeness to avoid asynchronous phenomenon between the systems. By integrating the fuzzy nonfragile observer and reference model signal into the tracking controller design, the tracking error can be reduced.

CHAOS SOLITONS & FRACTALS (2023)

Article Engineering, Civil

Using Tabu Search to Avoid Concave Obstacles for Source Location

Junqi Zhang, Huan Liu, Peng Zu, Mengshi Zhao, Cheng Wang, Aiiad Albeshri, Abdullah Abusorrah, MengChu Zhou

Summary: Recently, the use of a particle swarm optimizer (PSO) to guide robots in a source location problem has gained attention. Traditional obstacle avoidance strategies are not effective when robots lack prior information. This work proposes a novel PSO based on Tabu Search (PSO-TS) that sets trapping areas as tabu objects to enable robots to locate multiple sources without prior knowledge or expensive hardware.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (2023)

Article Automation & Control Systems

Protocol-based fault detection filtering for memristive neural networks with dynamic quantization

Gang Qin, An Lin, Jun Cheng, Mengjie Hu, Iyad Katib

Summary: This study investigates the issue of event-triggered fault detection filtering for memristive neural networks with dynamic quantization in the discrete-time domain. A novel event-triggered protocol is proposed based on dynamic quantization parameter, fault occurrence probability, and network bandwidth utilization rate, and an asynchronous filter framework is developed to ensure the stability of the system.

JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS (2023)

Article Automation & Control Systems

Probabilistic event-triggered protocol for switched power systems under multi-strategy deception attack

Wei Kang, Gang Qin, Jun Cheng, Huaicheng Yan, Iyad Katib, Jinde Cao

Summary: This paper proposes a security control method for a discrete-time switched power system using a probabilistic event-triggered protocol, which effectively optimizes network resource utilization and improves system security and stability under multi-strategy deception attacks.

JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS (2023)

Article Computer Science, Artificial Intelligence

Adaptive neural network control for Markov jumping systems against deception attacks

Junhui Wu, Gang Qin, Jun Cheng, Jinde Cao, Huaicheng Yan, Iyad Katib

Summary: This paper proposes an innovative approach to mitigate the effects of deception attacks in Markov jumping systems by developing an adaptive neural network control strategy. The approach effectively approximates the unbounded false signals injected by deception attacks and establishes a connection between the joint Markov chain and controller.

NEURAL NETWORKS (2023)

Article Computer Science, Information Systems

Hybrid Hunter-Prey Optimization with Deep Learning-Based Fintech for Predicting Financial Crises in the Economy and Society

Iyad Katib, Fatmah Y. Assiri, Turki Althaqafi, Zenah Mahmoud Alkubaisy, Diaa Hamed, Mahmoud Ragab, Heung-Il Suk

Summary: Smart Fintech, empowered by data science and artificial intelligence, drives automated, intelligent, personalized financial and economic businesses, playing a crucial role in today's technology-driven society and economies.

ELECTRONICS (2023)

Editorial Material Computer Science, Information Systems

Artificial Intelligence Solutions and Applications for Distributed Systems in Smart Spaces

Juan M. M. Corchado, Sara Rodriguez, Fernando de la Prieta, Pawel Sitek, Vicente Julian, Rashid Mehmood

ELECTRONICS (2023)

暂无数据