File Access Characteristics of Deep Learning Workloads and Cache-Friendly Data Management

Jeongha Lee, Hyokyung Bahn

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Recently, the data size of deep learning grows rapidly, and accessing file data can significantly degrade deep learning performance. To quantify this, we analyze the file access characteristics of deep learning workloads and find out that they are quite different from those of traditional workloads. Specifically, in the deep learning's training process, all file data are randomly accessed, and due to this feature, it is difficult to improve the performance of file access through caching. To cope with this situation, we present a cache-friendly file data management policy for accelerating data access in deep learning. Unlike conventional deep learning training processes that shuffle all datasets every epoch, our policy defines a shuffling unit called bundle, and improves the spatial locality of file access without compromising the model's training efficiency. We also improve the temporal locality of data access by arranging bundles in an alternating order for each epoch. Experimental results show that our data management policy improves the miss ratio of file cache by 17.0%, and the execution time of training by 24.7% when accessing file data in deep learning.

Original languageEnglish
Title of host publicationProceeding - EECSI 2023
Subtitle of host publication10th Electrical Engineering, Computer Science and Informatics Conference
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages328-331
Number of pages4
ISBN (Electronic)9798350306866
DOIs
StatePublished - 2023
Event10th International Conference on Electrical Engineering, Computer Science and Informatics, EECSI 2023 - Palembang, Indonesia
Duration: 20 Sep 202321 Sep 2023

Publication series

NameInternational Conference on Electrical Engineering, Computer Science and Informatics (EECSI)
ISSN (Print)2407-439X

Conference

Conference10th International Conference on Electrical Engineering, Computer Science and Informatics, EECSI 2023
Country/TerritoryIndonesia
CityPalembang
Period20/09/2321/09/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • data management
  • dataset
  • deep learning
  • File access
  • file cache

Fingerprint

Dive into the research topics of 'File Access Characteristics of Deep Learning Workloads and Cache-Friendly Data Management'. Together they form a unique fingerprint.

Cite this