TY - JOUR
T1 - An Energy-Efficient Deep Convolutional Neural Network Inference Processor with Enhanced Output Stationary Dataflow in 65-nm CMOS
AU - Sim, Jaehyeong
AU - Lee, Somin
AU - Kim, Lee Sup
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIP) under Grant 2017R1A2B2009380
Funding Information:
Manuscript received April 18, 2019; revised July 4, 2019 and July 22, 2019; accepted August 11, 2019. Date of publication September 2, 2019; date of current version December 27, 2019. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIP) under Grant 2017R1A2B2009380. (Corresponding author: Lee-Sup Kim.) J. Sim and L.-S. Kim are with the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, South Korea (e-mail: leesup@kaist.ac.kr).
Publisher Copyright:
© 1993-2012 IEEE.
PY - 2020/1
Y1 - 2020/1
N2 - We propose a deep convolutional neural network (CNN) inference processor based on a novel enhanced output stationary (EOS) dataflow. Based on the observation that some activations are commonly used in two successive convolutions, the EOS dataflow employs dedicated register files (RFs) for storing such reused activation data to eliminate redundant memory accesses for highly energy-consuming SRAM banks. In addition, processing elements (PEs) are split into multiple small groups such that each group covers a tile of input activation map to increase the usability of activation RFs (ARFs). The processor has two different voltage/frequency domains. The computation domain with 512 PEs operates at near-threshold voltage (NTV) (0.4 V) and 60-MHz frequency to increase energy efficiency, while the rest of the processors including 848-KB SRAMs run at 0.7 V and 120-MHz frequency to increase both on-chip and off-chip memory bandwidths. The measurement results show that our processor is capable of running AlexNet at 831 GOPS/W, VGG-16 at 1151 GOPS/W, ResNet-18 at 1004 GOPS/W, and MobileNet at 948 GOPS/W energy efficiency.
AB - We propose a deep convolutional neural network (CNN) inference processor based on a novel enhanced output stationary (EOS) dataflow. Based on the observation that some activations are commonly used in two successive convolutions, the EOS dataflow employs dedicated register files (RFs) for storing such reused activation data to eliminate redundant memory accesses for highly energy-consuming SRAM banks. In addition, processing elements (PEs) are split into multiple small groups such that each group covers a tile of input activation map to increase the usability of activation RFs (ARFs). The processor has two different voltage/frequency domains. The computation domain with 512 PEs operates at near-threshold voltage (NTV) (0.4 V) and 60-MHz frequency to increase energy efficiency, while the rest of the processors including 848-KB SRAMs run at 0.7 V and 120-MHz frequency to increase both on-chip and off-chip memory bandwidths. The measurement results show that our processor is capable of running AlexNet at 831 GOPS/W, VGG-16 at 1151 GOPS/W, ResNet-18 at 1004 GOPS/W, and MobileNet at 948 GOPS/W energy efficiency.
KW - Convolutional neural network (CNN)
KW - dataflow
KW - deep learning
KW - energy-efficient processor
KW - near-threshold voltage (NTV)
UR - http://www.scopus.com/inward/record.url?scp=85077823130&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2019.2935251
DO - 10.1109/TVLSI.2019.2935251
M3 - Article
AN - SCOPUS:85077823130
SN - 1063-8210
VL - 28
SP - 87
EP - 100
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 1
M1 - 8822636
ER -