An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for in Situ Personalization on Smart Devices

Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, Yeongjae Choi, Hyeonuk Kim, Lee Sup Kim

Research output: Contribution to journalArticlepeer-review

31 Scopus citations


A scalable deep-learning accelerator supporting the training process is implemented for device personalization of deep convolutional neural networks (CNNs). It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Unlike the previous works where they implement design techniques to exploit the same characteristics from the inference, we analyze major issues that occurred from training in a resource-constrained system to resolve the bottlenecks. A masking scheme in the propagation core reduces a massive amount of intermediate activation data storage. It eliminates frequent off-chip memory accesses for holding the generated activation data until the backward path. A disparate dataflow architecture is implemented for the weight gradient computation to enhance PE utilization while maximally reuse the input data. Furthermore, the modified weight update system enables an 8-bit fixed-point computing datapath. The processor is implemented in 65-nm CMOS technology and occupies 10.24 mm2 of the core area. It operates with the supply voltage from 0.63 to 1.0 V, and the computing engine runs in near-threshold voltage of 0.5 V. The chip consumes 40.7 mW at 50 MHz with the highest efficiency and achieves 47.4 $\mu \text{J}$ /epoch of training efficiency for the customized CNN model.

Original languageEnglish
Article number9137200
Pages (from-to)2691-2702
Number of pages12
JournalIEEE Journal of Solid-State Circuits
Issue number10
StatePublished - Oct 2020

Bibliographical note

Funding Information:
Manuscript received January 28, 2020; revised April 1, 2020; accepted May 14, 2020. Date of publication July 9, 2020; date of current version September 24, 2020. This article was approved by Associate Editor Atsushi Kawasumi. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2017R1A2B2009380). (Corresponding author: Lee-Sup Kim.) The authors are with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea (e-mail:;

Publisher Copyright:
© 1966-2012 IEEE.


  • Convolutional neural network (CNN)
  • dataflow
  • deep learning
  • deep-learning application-specific integrated circuit (ASIC)
  • neural network training
  • training processor


Dive into the research topics of 'An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for in Situ Personalization on Smart Devices'. Together they form a unique fingerprint.

Cite this