☆ 4.5 Article

Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores

COMPUTERS & ELECTRICAL ENGINEERING (2020)

Journal

COMPUTERS & ELECTRICAL ENGINEERING

Volume 88, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.compeleceng.2020.106848

Keywords

Sparse matrix multiplication; GPU; Tensor Cores; Parallel computing; SpGEMM

Categories

Computer Science, Hardware & Architecture Computer Science, Interdisciplinary Applications Engineering, Electrical & Electronic

Funding

High Performance Soft-tissue Navigation (HIPERNAV - H2020-MSCA-ITN-2016)
European Union [722068]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM challenging. Modern GPUs include Tensor Core Units (TCUs), which specialize in dense matrix multiplication. Our aim is to re-purpose TCUs for sparse matrices. The key idea of our spGEMM algorithm, tSparse, is to multiply sparse rectangular blocks using the mixed precision mode of TCUs. tSparse partitions the input matrices into files and operates only on files which contain one or more elements. It creates a task list of the files, and performs matrix multiplication of these files using TCUs. To the best of our knowledge, this is the first time that TCUs are used in the context of spGEMM. We show that spGEMM, with our filing approach, benefits from TCUs. Our approach significantly improves the performance of spGEMM in comparison to cuSPARSE, CUSP, RMerge2, Nsparse, AC-SpGEMM and spECK.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Information Systems

OpSparse: A Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs

Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Linyong Huang, Hongzhong Zheng, Yuan Xie

Summary: Sparse general matrix multiplication (SpGEMM) is an important computation in many applications, but achieving high-performance SpGEMM on modern processors is challenging. Existing SpGEMM libraries focus on algorithm design but neglect low-level architecture-specific optimizations, resulting in inefficient implementations. This paper proposes a highly optimized SpGEMM library called OpSparse, which improves performance through various optimization techniques such as optimizing memory utilization, reducing access to hash tables, and improving execution parallelism. Evaluation results on an Nvidia Tesla V100 GPU show significant speedups compared to state-of-the-art SpGEMM libraries.

IEEE ACCESS (2022)

Add to Collection

Article Computer Science, Theory & Methods

A Systematic Survey of General Sparse Matrix-matrix Multiplication

Jianhua Gao, Weixing Ji, Fangli Chang, Shiyu Han, Bingxin Wei, Zeming Liu, Yizhuo Wang

Summary: This article provides a structured and comprehensive overview of the research on General Sparse Matrix-Matrix Multiplication (SpGEMM). It categorizes existing research based on target architectures and design choices, covering topics such as applications, compression formats, formulations, optimizations, and programming models. The article analyzes and summarizes the rationales of different algorithms and presents a thorough performance comparison of existing implementations. Future research directions are also highlighted to encourage better design and implementations in later studies.

ACM COMPUTING SURVEYS (2023)

Add to Collection

Article Computer Science, Information Systems

Accelerating CPU-Based Sparse General Matrix Multiplication With Binary Row Merging

Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Hongzhong Zheng, Yuan Xie

Summary: Sparse general matrix multiplication (SpGEMM) is a fundamental building block for many real-world applications. This paper proposes a novel and efficient accumulation method named BRMerge for multi-core CPU architectures. The proposed method demonstrates improved memory access efficiency and outperforms the existing SpGEMM libraries in terms of performance in the evaluations with commonly used benchmarks.

IEEE ACCESS (2022)

Add to Collection

Article Computer Science, Theory & Methods

GPU Tensor Cores for Fast Arithmetic Reductions

Cristobal A. Navarro, Roberto Carrasco, Ricardo J. Barrientos, Javier A. Riquelme, Raimundo Vega

Summary: This article introduces a parallel algorithm for arithmetic reduction using GPU tensor cores, achieving faster performance and energy efficiency. Experimental results demonstrate that the proposed method outperforms standard GPU reduction and Nvidia's CUB library by approximately 3.2x and 2x, respectively, while maintaining low numerical error.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2021)

Add to Collection

Article Computer Science, Hardware & Architecture

High-Performance Tensor Learning Primitives Using GPU Tensor Cores

Xiao-Yang Liu, Zeliang Zhang, Zhiyuan Wang, Han Lu, Xiaodong Wang, Anwar Walid

Summary: This paper presents hardware-oriented optimization strategies for tensor learning primitives on GPU tensor cores, resulting in significant speedups for tasks such as tensor decomposition and neural network compression. The proposed optimizations achieve up to 32.25x speedup compared to existing libraries like TensorLab and TensorLy, demonstrating the effectiveness of GPU-based tensor learning.

IEEE TRANSACTIONS ON COMPUTERS (2023)

Add to Collection

Article Computer Science, Theory & Methods

A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-Acceleration

Haotian Wang, Wangdong Yang, Rong Hu, Renqiu Ouyang, Kenli Li, Keqin Li

Summary: This paper presents a novel approach called SpTMCM and investigates its coupling with the Tensor Core Unit (TCU). The proposed approach offers a uniform storage format and optimization method for SpTMCM, addressing the inefficient memory accesses caused by irregular distribution of sparse tensors. A TCU-based tensor parallel algorithm is developed to improve memory bandwidth. Experimental results show significant speedups compared to state-of-the-art methods for SpMTTKRP and SpTTMChain on real-world sparse tensors using NVIDIA A100 GPU.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2023)

Add to Collection

Article Physics, Multidisciplinary

Acceleration of Approximate Matrix Multiplications on GPUs

Takuya Okuyama, Andre Rohm, Takatomo Mihana, Makoto Naruse

Summary: Matrix multiplication is important for various applications, and reducing computation time is crucial. Despite the potential of GPUs, research has not focused on accelerating AMMs for general matrices. In this paper, we propose a method to improve Monte Carlo AMMs, with optimal values for hyperparameters. The proposed method enhances matrix product approximation without increasing computation time, and is compatible with parallel operations on GPUs, demonstrating halved computation time compared to the conventional power method.

ENTROPY (2023)

Add to Collection

Article Chemistry, Multidisciplinary

On the Safe Deployment of Matrix Multiplication in Massively Parallel Safety-Related Systems

Javier Fernandez, Jon Perez-Cerrolaza, Irune Agirre, Alejandro J. Calderon, Jaume Abella, Francisco J. Cazorla

Summary: This paper presents a safe matrix-matrix multiplication software implementation for GPUs with random hardware error-detection capabilities, which serves as a foundation for the implementation of safe deep learning libraries for GPUs. The performance impact and achievable diagnostic coverage of these mechanisms are measured with a set of representative matrix dimensions.

APPLIED SCIENCES-BASEL (2022)

Add to Collection

Article Geochemistry & Geophysics

Joint Nonlinear Inversion of Full Tensor Gravity Gradiometry Data and Its Parallel Algorithm

Zhenlong Hou, Boxuan Sun, Pengbo Qin, Chong Zhang, Zhaohai Meng

Summary: This paper proposes a parallel joint nonlinear inversion method for full tensor gravity gradiometry data, aiming to improve interpretation and computing ability. By utilizing a graphics processing unit (GPU), a parallel solution is implemented. Data tests demonstrate that this method has good anti-noise performance and accuracy, making it suitable for large-scale inversions.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2022)

Add to Collection

Article Computer Science, Theory & Methods

Adaptive diagonal sparse matrix-vector multiplication on GPU

Jiaquan Gao, Yifei Xia, Renjie Yin, Guixia He

Summary: An adaptive sparse matrix-vector multiplication (SpMV) for diagonal sparse matrices on GPU, named DIA-Adaptive, is presented to automatically choose the ideal storage format and kernel, achieving high performance.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2021)

Add to Collection

Article Computer Science, Software Engineering

A dynamic parameter tuning method for SpMM parallel execution

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

Summary: Sparse matrix-matrix multiplication is a fundamental kernel used in many algorithms. This article proposes a dynamic parameter tuning method to balance the load among processes in order to improve the performance of SpMM.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2023)

Add to Collection

Article Computer Science, Information Systems

S-MPEC: Sparse Matrix Multiplication Performance Estimator on a Cloud Environment

Jueon Park, Kyungyong Lee

Summary: In this paper, we propose a model called S-MPEC for predicting and optimizing the latency of sparse matrix multiplication (SPMM) tasks in distributed cloud environments using Apache Spark. By characterizing different distributed SPMM implementation methods and considering the characteristics and hardware specifications of the cloud, we establish an accurate prediction model that recommends the optimal implementation method. The experimental results show that users can expect a 44% reduction in latency compared to native SPMM implementations in Apache Spark.

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2023)

Add to Collection

Article Computer Science, Software Engineering

Advancing on an efficient sparse matrix multiplication kernel for modern GPUs

Gonzalo Berger, Manuel Freire, Renzo Marini, Ernesto Dufrechou, Pablo Ezzatti

Summary: Sparse matrix multiplication has become increasingly important in data science and machine learning applications, leading to research focusing on accelerating this kernel in GPUs. Introducing new sparse matrix storage formats to mitigate irregularity, optimizations can significantly outperform existing implementations in experiments and compete with mature algorithms.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Numerical behavior of NVIDIA tensor cores

Massimiliano Fasi, Nicholas J. Higham, Mantas Mikaitis, Srikara Pranesh

Summary: The study investigates the floating-point arithmetic implemented in NVIDIA tensor cores, determining important details through experiments on different graphics cards. It also provides a test suite that can be easily adapted for testing newer versions of NVIDIA tensor cores and similar accelerators from other vendors.

PEERJ COMPUTER SCIENCE (2021)

Add to Collection

Article Computer Science, Software Engineering

A new diagonal storage for efficient implementation of sparse matrix-vector multiplication on graphics processing unit

Guixia He, Qi Chen, Jiaquan Gao

Summary: This paper introduces a new diagonal storage format RBDCS and proposes an efficient SpMV kernel for handling multidiagonal sparse matrices. Experimental results demonstrate that the RBDCS kernel outperforms popular diagonal SpMV kernels.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2021)

Add to Collection

Article Computer Science, Hardware & Architecture

Lightweight method of shuffling overlapped data-blocks for data integrity and security in WSNs

Francisco Alcaraz Velasco, Jose Manuel Palomares, Joaquin Olivares

Summary: This study introduces a new data integrity method with medium security levels and low energy cost in wireless sensor networks, using a lightweight mechanism with overlapping blocks for data protection, demonstrating its effectiveness through experiments.

COMPUTER NETWORKS (2021)

Add to Collection

Article Computer Science, Hardware & Architecture

FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications

Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios Diamantopoulos, Juan Gomez-Luna, Henk Corporaal, Onur Mutlu

Summary: Modern data-intensive applications require high computational capabilities but are limited by strict power constraints. The development of FPGAs with HBM provides a solution to alleviate the bottleneck of data movement, improving efficiency and energy savings in computing systems.

IEEE MICRO (2021)

Add to Collection

Article Computer Science, Hardware & Architecture

Accelerating Weather Prediction Using Near-Memory Reconfigurable Fabric

Gagandeep Singh, Dionysios Diamantopoulos, Juan Gomez-Luna, Christoph Hagleitner, Sander Stuijk, Henk Corporaal, Onur Mutlu

Summary: The ongoing climate change requires fast and accurate weather and climate modeling. However, current CPU and GPU implementations face limitations in performance and energy consumption for large-scale weather prediction simulations. To overcome these challenges, near-memory acceleration using high-bandwidth memory (HBM) is proposed and evaluated. Experimental results show significant performance improvement and energy efficiency compared to traditional methods.

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS (2022)

Add to Collection

Article Computer Science, Hardware & Architecture

PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM

Ataberk Olgun, Juan Gomez Luna, Konstantinos Kanellopoulos, Behzad Salami, Hasan Hassan, Oguz Ergin, Onur Mutlu

Summary: This paper introduces commodity DRAM-based processing-using-memory (PuM) techniques that can alleviate the data movement bottleneck at low cost. The challenges of system integration for these techniques are discussed, and a flexible framework called Processing-in-DRAM (PiDRAM) is developed to address these challenges. The authors implement and evaluate two PuM techniques, demonstrating the flexibility and effectiveness of PiDRAM. The potential performance improvement brought by PiDRAM is observed.

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2022)

Add to Collection

Article Computer Science, Hardware & Architecture

Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud

Geraldo F. Oliveira, Juan Gomez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu

Summary: Neural networks (NNs) are becoming increasingly important and complex. Processing-in-memory (PIM) paradigm can accelerate memory-bound NNs, but different PIM architectures have different effects on NN performance and energy efficiency.

IEEE MICRO (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

3D reconstruction system and multiobject local tracking algorithm designed for billiards

Francisco J. J. Rodriguez-Lozano, Juan C. C. Gamez-Granados, Hector Martinez, Jose M. M. Palomares, Joaquin Olivares

Summary: The use of virtual reality or augmented reality systems in billiards sports is helpful for entertainment and improving player's skills. However, tracking multiple small identical objects like balls can be challenging. This research proposes a new tracking algorithm called MOLT, which can accurately track balls even with motion blur caused by low-resolution and low-frame-rate devices. The proposed system covers all steps from image capture to 3D reconstruction using computer vision, providing a promising and useful tool for training.

APPLIED INTELLIGENCE (2023)

Add to Collection

Article Computer Science, Interdisciplinary Applications

Efficient data dimensionality reduction method for improving road crack classification algorithms

Francisco J. Rodriguez-Lozano, Juan C. Gamez-Granados, Jose M. Palomares, Joaquin Olivares

Summary: Automatic crack classification is important for road maintenance. However, using many features for classification is inefficient for embedded systems with low computational resources. This study proposes a new data dimensionality reduction (DDR) method called DDR4CC, which reduces the required information about cracks to only four features. The effectiveness of DDR4CC is compared with eight other DDR methods using five different classification algorithms and datasets. Results show that DDR4CC improves the classification algorithms, providing highly accurate classifiers with minimal computation time.

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING (2023)

Add to Collection

Article Computer Science, Information Systems

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

Nika Mansouri Ghiasi, Nandita Vijaykumar, Geraldo F. Oliveira, Lois Orosa, Ivan Fernandez, Mohammad Sadrosadati, Konstantinos Kanellopoulos, Nastaran Hajinazar, Juan Gomez Luna, Onur Mutlu

Summary: Partitioning applications between near-data processing (NDP) and host CPU cores causes inter-segment data movement overhead, which can be mitigated by ALP, a programmer-transparent technique that proactively and accurately transfers required data between segments based on the invariant instructions. Evaluation on a wide range of workloads demonstrates significant speedup over traditional CPU-only and NDP-only executions.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING (2023)

Add to Collection

Article Computer Science, Information Systems

Casper: Accelerating Stencil Computations Using Near-Cache Processing

Alain Denzler, Geraldo F. Oliveira, Nastaran Hajinazar, Rahul Bera, Gagandeep Singh, Juan Gomez-Luna, Onur Mutlu

Summary: This paper introduces Casper, a near-cache accelerator that improves the performance of stencil computations and reduces system energy consumption. Casper is designed based on two key ideas: avoiding the cost of moving rarely reused data throughout the cache hierarchy, and exploiting the regularity of data accesses and inherent parallelism of stencil computations. Experimental results show that Casper improves performance by an average of 1.65x (up to 4.16x) compared to commercial high-performance multi-core processors, while reducing system energy consumption by an average of 35% (up to 65%). Casper provides 37x (up to 190x) improvement in performance-per-area compared to a state-of-the-art GPU.

IEEE ACCESS (2023)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

A Preliminary Fuzzy Markup Language based Approach for the Queue Buffer Size Optimization in Fog Nodes for Stream Processing

Gregorio Corpas-Prieto, Fernando Leon-Garcia, Juan Carlos Gamez-Granados, Jose Manuel Palomares, Joaquin Olivares, Jose Manuel Soto-Hidalgo

Summary: The Internet of Things (IoT) is divided into edge, fog, and cloud layers. The fog layer enables stream processing by handling data transmission and cascade processing. To optimize network traffic, factors such as connections, delays, and buffer size need to be considered, which are affected by uncertainty and imprecision. Fuzzy rule-based systems are suitable for managing complex data and imprecision. The proposed approach dynamically adjusts buffer size to prevent network collapse.

2022 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE) (2022)

Add to Collection

Proceedings Paper Computer Science, Information Systems

Optimum Vessel Segmentation

Joaquin Olivares, Orestis Zachariadis, Nitin Satpute, Juan Gomez-Luna

Summary: Accurate blood vessel segmentation in medical imaging is crucial for surgeries. In this study, we introduce a parallelized region growth algorithm (pSRG) that computes the gradient using Persistence and grid-stride loops. This approach eliminates unnecessary memory transfers, leading to faster computation and more precise segmentation.

2022 17TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI) (2022)

Add to Collection

Proceedings Paper Computer Science, Information Systems

Analysis of the random shuffling of message blocks as a low-cost integrity and security measure

Francisco Alcaraz-Velasco, Jose M. Palomares, Joaquin Olivares

Summary: Recently, a mechanism that randomly shuffles the data sent and allows securing the communication without the need to encrypt all the information has been proposed. This proposal is ideal for IoT systems with low computational capacity. It has been demonstrated that obtaining the original message without knowledge of the applied disordering is unfeasible with current technology, ensuring its safety.

2022 17TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI) (2022)

Add to Collection

Article Computer Science, Information Systems

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

Juan Gomez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, Onur Mutlu

Summary: This paper provides a comprehensive analysis of the first publicly-available real-world PIM architecture. Experimental characterization and benchmark evaluation on the UPMEM PIM system offer new insights into performance, energy consumption, and suitability for different workloads.

IEEE ACCESS (2022)

Add to Collection

Article Computer Science, Information Systems

Cross-Modality Guided Contrast Enhancement for Improved Liver Tumor Image Segmentation

Rabia Naseem, Zohaib Amjad Khan, Nitin Satpute, Azeddine Beghdadi, Faouzi Alaya Cheikh, Joaquin Olivares

Summary: The proposed goal-oriented contrast enhancement method improves tumor segmentation performance by enhancing guided image and controlling image quality through optimization.

IEEE ACCESS (2021)

Add to Collection

Article Computer Science, Information Systems

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Geraldo F. Oliveira, Juan Gomez-Luna, Lois Orosa, Saugata Ghose, Nandita Vijaykumar, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu

Summary: Data movement between the CPU and main memory is a major bottleneck for improving performance, scalability, and energy efficiency in modern computer systems. Various techniques have been employed to reduce this overhead, from traditional cache hierarchies to emerging Near-Data Processing (NDP) methods. However, there is still a lack of understanding regarding the key metrics for identifying data movement bottlenecks and their relation to different mitigation mechanisms.

IEEE ACCESS (2021)

Add to Collection

Article Computer Science, Hardware & Architecture

Discovering e-commerce user groups from online comments: An emotional correlation analysis-based clustering method

Jia Ke, Ying Wang, Mingyue Fan, Xiaojun Chen, Wenlong Zhang, Jianping Gou

Summary: This study integrates the emotional correlation analysis model and Self-organizing Map (SOM) to construct fine-grained user emotion vector based on review text and perform visual cluster analysis, which helps platform merchants quickly mine user clustering and characteristics.

COMPUTERS & ELECTRICAL ENGINEERING (2024)

Add to Collection

Article Computer Science, Hardware & Architecture

Multilevel-based algorithm for hyperspectral image interpretation

Shi Qiu, Huping Ye, Xiaohan Liao, Benyue Zhang, Miao Zhang, Zimu Zeng

Summary: This paper proposes a multilevel-based algorithm for hyperspectral image interpretation, which achieves semantic segmentation through multidimensional information fusion, and introduces a context interpretation module to improve detection performance.

COMPUTERS & ELECTRICAL ENGINEERING (2024)

Add to Collection

Article Computer Science, Hardware & Architecture

Maximizing the profit of omnichannel closed-loop supply chains with mean-variance criteria

Jianteng Xu, Qingguo Bai, Zhiwen Li, Lili Zhao

Summary: This study constructs two optimization models for the omnichannel closed-loop supply chain by leveraging the combined power of leader-follower game and mean-variance theories. The focus is on analyzing the performance of manufacturers who distribute products through physical stores. The results show that the risk-averse attitude of the physical store has a positive impact on the overall system profitability, but if the introduced physical store belongs to another firm, total profit experiences a decline.

COMPUTERS & ELECTRICAL ENGINEERING (2024)

Add to Collection

Article Computer Science, Hardware & Architecture

GraphPhys: Facial video-based physiological measurement with graph neural network

Jiahao Xiong, Weihua Ou, Zhonghua Liu, Jianping Gou, Wenjun Xiao, Haitao Liu

Summary: This paper proposes a novel remote photoplethysmography framework, named GraphPhys, which utilizes graph neural network to extract physiological signals and introduces Average Relative GraphConv for the task of remote physiological signal measurement. Experimental results show that the methods based on GraphPhys significantly outperform the original methods.

COMPUTERS & ELECTRICAL ENGINEERING (2024)

Add to Collection

Article Computer Science, Hardware & Architecture

User financial credit analysis for blockchain regulation

Zhiyao Tong, Yiyi Hu, Chi Jiang, Yin Zhang

Summary: The rise of illicit activities involving blockchain digital currencies has become a growing concern. In order to prevent illegal activities, this study combines financial risk control with machine learning to identify and predict the risks of users with poor credit. Experimental results demonstrate high performance in user financial credit analysis.

COMPUTERS & ELECTRICAL ENGINEERING (2024)

Add to Collection

© Peeref 2019-2024. All rights reserved.