☆ 4.7 Article

BitSystolic: A 26.7 TOPS/W 2b∼8b NPU With Configurable Data Flows for Edge Devices

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS (2021)

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS

卷 68, 期 3, 页码 1134-1145

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSI.2020.3043778

关键词

Configurable data flow; deep neural network (DNN); mixed-precision inference; systolic array

类别

Engineering, Electrical & Electronic

资金

NSF [CCF-1725456]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The article introduces a neural processing unit based on a systolic array structure called BitSystolic, which can support numerical precision configuration in the range of 2-8 bits, fulfilling different requirements across mixed-precision models and tasks. The unit can also flexibly support data flows in various types of neural layers and achieve adaptive optimization of data reuse by switching between different modes.

Efficient deployment of deep neural networks (DNNs) emerges with the exploding demand for artificial intelligence on edge devices. Mixed-precision inference with both compressed model and reduced computation cost enlightens a way for accurate and efficient DNN deployments. Despite obtaining mixed-precision DNN models at the algorithmic level, there still lacks sufficient hardware support. In this work, we propose BitSystolic, a neural processing unit based on a systolic array structure. In BitSystolic, the numerical precision of both weights and activations can be configured in the range of 2- 8 bits, fulfilling different requirements across mixed-precision models and tasks. Moreover, BitSystolic can support various data flows presented in different types of neural layers (e.g., convolution, fully-connected, and recurrent neural layers) and adaptive optimization of data reuse by switching between the matrix-matrix mode and vector-matrix mode. We designed and fabricated the proposed BitSystolic composed of a 16x16 systolic array. Our measurement results show that BitSystolic features the unified power efficiency of up to 26.7 TOPS/W with 17.8 mW peak power consumption across various layer types.

BitSystolic: A 26.7 TOPS/W 2b∼8b NPU With Configurable Data Flows for Edge Devices

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

BitSystolic: A 26.7 TOPS/W 2b∼8b NPU With Configurable Data Flows for Edge Devices

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文