TY - GEN
T1 - NID
T2 - 37th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2018
AU - Sim, Jaehyeong
AU - Seol, Hoseok
AU - Kim, Lee Sup
N1 - Funding Information:
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NO. 2017R1A2B2009380).
Publisher Copyright:
© 2018 ACM.
PY - 2018/11/5
Y1 - 2018/11/5
N2 - Recent large-scale CNNs suffer from a severe memory wall problem as their number of weights range from tens to hundreds of millions. Processing in-memory (PIM) and binary CNN have been proposed to alleviate the number of memory accesses and footprints, respectively. By combining the two separate concepts, we propose a novel processing in-DRAM framework for binary CNN, called NID, where dominant convolution operations are processed using in-DRAM bulk bitwise operations. We first identify the problem that the bitcount operations with only bulk bitwise AND/OR/NOT incur significant overhead in terms of delay when the size of kernels gets larger. Then, we not only optimize the performance by efficiently allocating inputs and kernels to DRAM banks for both convolutional and fully-connected layers through design space explorations, but also mitigate the overhead of bitcount operations by splitting kernels into multiple parts. Partial sum accumulations and tasks of the other layers such as max-pooling and normalization layers are processed in the peripheral area of DRAM with negligible overheads. In results, our NID framework achieves 19X-36X performance and 9X-14X EDP improvements for convolutional layers, and 9X-17X performance and 1.4X-4.5X EDP improvements for fully-connected layers over previous PIM technique in four large-scale CNN models.
AB - Recent large-scale CNNs suffer from a severe memory wall problem as their number of weights range from tens to hundreds of millions. Processing in-memory (PIM) and binary CNN have been proposed to alleviate the number of memory accesses and footprints, respectively. By combining the two separate concepts, we propose a novel processing in-DRAM framework for binary CNN, called NID, where dominant convolution operations are processed using in-DRAM bulk bitwise operations. We first identify the problem that the bitcount operations with only bulk bitwise AND/OR/NOT incur significant overhead in terms of delay when the size of kernels gets larger. Then, we not only optimize the performance by efficiently allocating inputs and kernels to DRAM banks for both convolutional and fully-connected layers through design space explorations, but also mitigate the overhead of bitcount operations by splitting kernels into multiple parts. Partial sum accumulations and tasks of the other layers such as max-pooling and normalization layers are processed in the peripheral area of DRAM with negligible overheads. In results, our NID framework achieves 19X-36X performance and 9X-14X EDP improvements for convolutional layers, and 9X-17X performance and 1.4X-4.5X EDP improvements for fully-connected layers over previous PIM technique in four large-scale CNN models.
UR - http://www.scopus.com/inward/record.url?scp=85058170691&partnerID=8YFLogxK
U2 - 10.1145/3240765.3240831
DO - 10.1145/3240765.3240831
M3 - Conference contribution
AN - SCOPUS:85058170691
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - 2018 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2018 - Digest of Technical Papers
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 November 2018 through 8 November 2018
ER -