A 47.4µJ/epoch trainable deep convolutional neural network accelerator for in-situ personalization on smart devices

Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, Yeongjae Choi, Hyeonuk Kim, Lee Sup Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

A scalable deep learning accelerator supporting both inference and training is implemented for device personalization of deep convolutional neural networks. It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Two cores conduct forward and backward propagation in convolutional layers and utilize a masking scheme to reduce 88.3% of intermediate data to store for training. The third core executes weight update process in convolutional layers and inner product computation in fully connected layers with a novel large window dataflow. The system enables 8-bit fixed point datapath with lossless training and consumes 47.4µJ/epoch for a customized deep CNN model.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages57-60
Number of pages4
ISBN (Electronic)9781728151069
DOIs
StatePublished - Nov 2019
Event15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 - Macao, China
Duration: 4 Nov 20196 Nov 2019

Publication series

NameProceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019
Volume2019-November

Conference

Conference15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019
Country/TerritoryChina
CityMacao
Period4/11/196/11/19

Fingerprint

Dive into the research topics of 'A 47.4µJ/epoch trainable deep convolutional neural network accelerator for in-situ personalization on smart devices'. Together they form a unique fingerprint.

Cite this