Abstract
Recently, the data size of deep learning grows rapidly, and accessing file data can significantly degrade deep learning performance. To quantify this, we analyze the file access characteristics of deep learning workloads and find out that they are quite different from those of traditional workloads. Specifically, in the deep learning's training process, all file data are randomly accessed, and due to this feature, it is difficult to improve the performance of file access through caching. To cope with this situation, we present a cache-friendly file data management policy for accelerating data access in deep learning. Unlike conventional deep learning training processes that shuffle all datasets every epoch, our policy defines a shuffling unit called bundle, and improves the spatial locality of file access without compromising the model's training efficiency. We also improve the temporal locality of data access by arranging bundles in an alternating order for each epoch. Experimental results show that our data management policy improves the miss ratio of file cache by 17.0%, and the execution time of training by 24.7% when accessing file data in deep learning.
Original language | English |
---|---|
Title of host publication | Proceeding - EECSI 2023 |
Subtitle of host publication | 10th Electrical Engineering, Computer Science and Informatics Conference |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 328-331 |
Number of pages | 4 |
ISBN (Electronic) | 9798350306866 |
DOIs | |
State | Published - 2023 |
Event | 10th International Conference on Electrical Engineering, Computer Science and Informatics, EECSI 2023 - Palembang, Indonesia Duration: 20 Sep 2023 → 21 Sep 2023 |
Publication series
Name | International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) |
---|---|
ISSN (Print) | 2407-439X |
Conference
Conference | 10th International Conference on Electrical Engineering, Computer Science and Informatics, EECSI 2023 |
---|---|
Country/Territory | Indonesia |
City | Palembang |
Period | 20/09/23 → 21/09/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- data management
- dataset
- deep learning
- File access
- file cache