☆ 4.2 Article

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS (2017)

Journal

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS

Volume 10, Issue 3, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3079758

Keywords

Design; Experimentation; Performance; FPGA architecture; convolutional neural networks; optimisation; high performance computing; application mapping

Funding

National Science Foundation of China [U1435219, 61303070, 61402507, 61402499]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Deep convolutional neural networks (CNNs) have gained great success in various computer vision applications. State-of-the-art CNN models for large-scale applications are computation intensive and memory expensive and, hence, are mainly processed on high-performance processors like server CPUs and GPUs. However, there is an increasing demand of high-accuracy or real-time object detection tasks in large-scale clusters or embedded systems, which requires energy-efficient accelerators because of the green computation requirement or the limited battery restriction. Due to the advantages of energy efficiency and reconfigurability, Field-Programmable Gate Arrays (FPGAs) have been widely explored as CNN accelerators. In this article, we present an in-depth analysis of computation complexity and the memory footprint of each CNN layer type. Then a scalable parallel framework is proposed that exploits four levels of parallelism in hardware acceleration. We further put forward a systematic design space exploration methodology to search for the optimal solution that maximizes accelerator throughput under the FPGA constraints such as on-chip memory, computational resources, external memory bandwidth, and clock frequency. Finally, we demonstrate the methodology by optimizing three representative CNNs (LeNet, AlexNet, and VGG-S) on a Xilinx VC709 board. The average performance of the three accelerators is 424.7, 445.6, and 473.4GOP/s under 100MHz working frequency, which outperforms the CPU and previous work significantly.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs

Saeedeh Akbari Rokn Abadi, Amirhossein Mohammadi, Somayyeh Koohi

Summary: In this study, a novel image-based encoding method called WalkIm is proposed for the classification of biological sequences. Compared to existing methods, WalkIm achieves competitive accuracy and superior efficiency without the need for parameter initialization or network architecture adjustment. Additionally, WalkIm exhibits high-speed convergence and reduced network complexity. The compatibility of WalkIm with free-space optical processing technology is also addressed, leading to a significant reduction in training time and preservation of image structure.

PLOS ONE (2022)