Abstract
A scalable deep learning accelerator supporting both inference and training is implemented for device personalization of deep convolutional neural networks. It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Two cores conduct forward and backward propagation in convolutional layers and utilize a masking scheme to reduce 88.3% of intermediate data to store for training. The third core executes weight update process in convolutional layers and inner product computation in fully connected layers with a novel large window dataflow. The system enables 8-bit fixed point datapath with lossless training and consumes 47.4µJ/epoch for a customized deep CNN model.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 57-60 |
| Number of pages | 4 |
| ISBN (Electronic) | 9781728151069 |
| DOIs | |
| State | Published - Nov 2019 |
| Event | 15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 - Macao, China Duration: 4 Nov 2019 → 6 Nov 2019 |
Publication series
| Name | Proceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 |
|---|---|
| Volume | 2019-November |
Conference
| Conference | 15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 |
|---|---|
| Country/Territory | China |
| City | Macao |
| Period | 4/11/19 → 6/11/19 |
Bibliographical note
Publisher Copyright:© 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.