Abstract
A scalable deep learning accelerator supporting both inference and training is implemented for device personalization of deep convolutional neural networks. It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Two cores conduct forward and backward propagation in convolutional layers and utilize a masking scheme to reduce 88.3% of intermediate data to store for training. The third core executes weight update process in convolutional layers and inner product computation in fully connected layers with a novel large window dataflow. The system enables 8-bit fixed point datapath with lossless training and consumes 47.4µJ/epoch for a customized deep CNN model.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 57-60 |
Number of pages | 4 |
ISBN (Electronic) | 9781728151069 |
DOIs | |
State | Published - Nov 2019 |
Event | 15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 - Macao, China Duration: 4 Nov 2019 → 6 Nov 2019 |
Publication series
Name | Proceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 |
---|---|
Volume | 2019-November |
Conference
Conference | 15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 |
---|---|
Country/Territory | China |
City | Macao |
Period | 4/11/19 → 6/11/19 |
Bibliographical note
Publisher Copyright:© 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.