VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing

Jaebeom Jeon, Minseong Gil, Junsu Kim, Jaeyong Park, Gunjae Koo, Myung Kuk Yoon, Yunho Oh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid advancement of Artificial Intelligence (AI) necessitates significant enhancements in the energy efficiency of Graphics Processing Units (GPUs) for Deep Neural Network (DNN) workloads. Such a challenge is particularly critical for embedded GPUs, which operate within stringent power constraints. Traditional GPU architectures, designed to support a limited set of numeric formats, face challenges in meeting the diverse requirements of modern AI applications. These applications demand support for various numeric formats to optimize computational speed and efficiency. This paper proposes VitBit, a novel software technique designed to overcome these limitations by enabling efficient processing of arbitrary integer format values, especially those 8 bits or fewer, which are increasingly prevalent in AI workloads. VitBit introduces two key innovations: the packing of arbitrary integer formats for parallel computation and the simultaneous execution of Tensor cores, INT and FP (Integer and Floating-Point) CUDA cores. This approach leverages the architectural features of modern GPUs, such as those based on NVIDIA Ampere architecture, which allows concurrent operation of FP32 and INT32 cores at full throughput. Our evaluation of VitBit on NVIDIA Jetson AGX Orin demonstrates substantial improvements in arithmetic density and peak throughput, achieving up to a 22% reduction in execution time for benchmark AI workloads without compromising inference accuracy. VitBit effectively bridges the gap between current hardware capabilities and the computational demands of AI, offering a scalable and cost-effective method for enhancing GPU performance in AI applications.

Original languageEnglish
Title of host publication53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PublisherAssociation for Computing Machinery
Pages1012-1021
Number of pages10
ISBN (Electronic)9798400708428
DOIs
StatePublished - 12 Aug 2024
Event53rd International Conference on Parallel Processing, ICPP 2024 - Gotland, Sweden
Duration: 12 Aug 202415 Aug 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference53rd International Conference on Parallel Processing, ICPP 2024
Country/TerritorySweden
CityGotland
Period12/08/2415/08/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • Deep Neural Network
  • GPU

Fingerprint

Dive into the research topics of 'VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing'. Together they form a unique fingerprint.

Cite this