期刊
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
卷 68, 期 3, 页码 1134-1145出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSI.2020.3043778
关键词
Configurable data flow; deep neural network (DNN); mixed-precision inference; systolic array
资金
- NSF [CCF-1725456]
The article introduces a neural processing unit based on a systolic array structure called BitSystolic, which can support numerical precision configuration in the range of 2-8 bits, fulfilling different requirements across mixed-precision models and tasks. The unit can also flexibly support data flows in various types of neural layers and achieve adaptive optimization of data reuse by switching between different modes.
Efficient deployment of deep neural networks (DNNs) emerges with the exploding demand for artificial intelligence on edge devices. Mixed-precision inference with both compressed model and reduced computation cost enlightens a way for accurate and efficient DNN deployments. Despite obtaining mixed-precision DNN models at the algorithmic level, there still lacks sufficient hardware support. In this work, we propose BitSystolic, a neural processing unit based on a systolic array structure. In BitSystolic, the numerical precision of both weights and activations can be configured in the range of 2- 8 bits, fulfilling different requirements across mixed-precision models and tasks. Moreover, BitSystolic can support various data flows presented in different types of neural layers (e.g., convolution, fully-connected, and recurrent neural layers) and adaptive optimization of data reuse by switching between the matrix-matrix mode and vector-matrix mode. We designed and fabricated the proposed BitSystolic composed of a 16x16 systolic array. Our measurement results show that BitSystolic features the unified power efficiency of up to 26.7 TOPS/W with 17.8 mW peak power consumption across various layer types.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据