☆ 4.5 Article

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

IEEE MICRO (2018)

Journal

IEEE MICRO

Volume 38, Issue 2, Pages 8-20

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/MM.2018.022071131

Keywords

Funding

Intel

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

To meet the computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance. Project Brainwave, Microsoft's principal infrastructure for AI serving in real time, accelerates deep neural network (DNN) inferencing in major services such as Bing's intelligent search features and Azure. Exploiting distributed model parallelism and pinning over low-latency hardware microservices, Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes. A high-performance, precision-adaptable FPGA soft processor is at the heart of the system, achieving up to 39.5 teraflops (Tflops) of effective performance at Batch 1 on a state-of-the-art Intel Stratix 10 FPGA.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5

Not enough ratings

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Journal

IEEE MICRO

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Journal

IEEE MICRO

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper