4.2 Article

A GPU-based tabu search for very large hardware/software partitioning with limited resource usage

Publisher

JAPAN SOC MECHANICAL ENGINEERS
DOI: 10.1299/jamdsm.2017jamdsm0060

Keywords

Hardware/software co-design; Hardware/software partitioning; GPU-based tabu search; GPU resource-limitation; Time-space tradeoff

Funding

  1. National Science Foundation of China [61472289]
  2. Key Technology RAMP
  3. D Program of Hubei Province [2014 BAA153]

Ask authors/readers for more resources

In hardware/software (HW/SW) co-design, HW/SW partitioning is the most important step since it determines which components are implemented in hardware and which are implemented in software. Since most of HW/SW partitioning problems are NP hard, heuristic methods have to be utilized to solve them, especially for the large size problems. GPU-based heuristic methods to accelerate HW/SW co-design are a promising way to reduce run time. However, the existing methods cannot deal with very large embedded applications because of GPU resource limitations. This paper presents a method to overcome the GPU resource limitations for very large partitioning while keeping a reasonable runtime. First, at the stage of computing the costs of the candidates, we propose a fast method of 2-flipping computing for very large HW/SW co-design. Our method is also general and can deal with both odd and even numbers of nodes. More importantly, our method avoids utilizing double-precision arithmetic units, which are scarce resources in GPU architecture. Second, since the GPU is constrained by memory limitations and the costs of candidates cannot be directly stored in the GPU's global memory, we present a time-space tradeoff strategy to break memory limitations for very large HW/SW partitioning. In this way, the following steps can be run under the constraint of GPU's memory limitations. Third, an in-place removal of infeasible solutions is proposed to reduce the overhead of global memory by half when the neighborhood is compacted. Fourth, when evaluating the tabu status of feasible candidates, we present a bitwise representation of tabu status to minimize the transfer overhead. Finally, we conduct a number of experiments. The results show that the proposed 2-flipping method of single precision data types works well. The results also demonstrate that the proposed approach expands the number of nodes of the task graph from 10,000 to 30,000 under the limitation of the GPU's global memory of 6 GB. The correlations between compression intensity and solution quality are analyzed to ensure the fairness and soundness of our method. Our work is general and can provide guidance for other applications.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Hardware & Architecture

Optimization of parallel iterated local search algorithms on graphics processing unit

Yi Zhou, Fazhi He, Yimin Qiu

JOURNAL OF SUPERCOMPUTING (2016)

Article Computer Science, Information Systems

Dynamic strategy based parallel ant colony optimization on GPUs for TSPs

Yi Zhou, Fazhi He, Yimin Qiu

SCIENCE CHINA-INFORMATION SCIENCES (2017)

Article Computer Science, Theory & Methods

Parallel ant colony optimization on multi-core SIMD CPUs

Yi Zhou, Fazhi He, Neng Hou, Yimin Qiu

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2018)

Article Engineering, Electrical & Electronic

Accelerating image convolution filtering algorithms on integrated CPU-GPU architectures

Yi Zhou, Fazhi He, Yimin Qiu

JOURNAL OF ELECTRONIC IMAGING (2018)

Article Computer Science, Information Systems

A Parallel Genetic Algorithm With Dispersion Correction for HW/SW Partitioning on Multi-Core CPU and Many-Core GPU

Neng Hou, Fazhi He, Yi Zhou, Yilin Chen, Xiaohu Yan

IEEE ACCESS (2018)

Article Biochemistry & Molecular Biology

High-dose lovastatin decreased basal prostacyclin production in cultured endothelial cells

Qi Zhou, Yi Zhou, Fred A. Kummerow

PROSTAGLANDINS & OTHER LIPID MEDIATORS (2009)

Article Computer Science, Information Systems

An efficient GPU-based parallel tabu search algorithm for hardware/software co-design

Neng Hou, Fazhi He, Yi Zhou, Yilin Chen

FRONTIERS OF COMPUTER SCIENCE (2020)

Article Automation & Control Systems

Dual-Network Task Scheduling inCyber-Physical Systems: A Cooptimization Approach

Xiaomao Wang, Li Chai, Yi Zhou, Feng Dan

Summary: A cooptimization method based on software defined networking technology is proposed to enhance the performance of task scheduling and network communication in large and complex engineering systems. Simulation results show the proposed scheme outperforms the traditional method in terms of task acceptance rate, average end-to-end delay, and network load balance degree.

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS (2021)

Article Computer Science, Artificial Intelligence

Adaptive gradient descent enabled ant colony optimization for routing problems

Yi Zhou, Weidong Li, Xiaomao Wang, Yimin Qiu, Weiming Shen

Summary: In this paper, a new type of ant colony optimization algorithm called ADACO is proposed with an innovative adaptive learning mechanism. The algorithm is validated through experiments on the traveling salesman problem (TSP) and the capacitated vehicle routing problem (CVRP), and it shows competitive performance in terms of accuracy, stability, and adaptability.

SWARM AND EVOLUTIONARY COMPUTATION (2022)

Proceedings Paper Automation & Control Systems

STEREOSCOPIC IMAGES ENHANCEMENT BASED ON EDGE SHARPENING OF WAVELET COEFFICIENTS

Qiu Yi-min, Chen Shi-hong, Zhou Yi, Liu Xin-hai

SENSORS, MECHATRONICS AND AUTOMATION (2014)

Proceedings Paper Engineering, Mechanical

3-D video enhancemnt based on histogram equalization and edge sharpening

Yimin Qiu, Shihong Chan, Yi Zhou, Ying Wang

MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3 (2013)

No Data Available