INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core

Jae Seok Kwak, Myung Kuk Yoon, Ipoom Jeong, Seunghyun Jin, Won Woo Ro

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior computation throughput for general matrix-matrix multiplication (GEMM) that has been widely used for deep learning applications. For massive-scale GEMMs, the entire matrix is practically divided into sub-matrices and assigned to multiple thread blocks and warps, and then processed by the tensor cores. Meanwhile, the same sub-matrix is regularly reused as an input to different sub-GEMMs, which causes redundant load operations from different warps and waste of register file spaces. To tackle this issue, we propose INTERPRET, a novel tensor core microarchitecture designed to minimize unnecessary accesses to the cache/memory hierarchy by leveraging the inter-warp data reuse characteristics. INTERPRET adopts a register renaming scheme to reduce the redundant load requests as well as the waste of register files, resulting in the reduction of the effective data load latency. INTERPRET further improves performance via non-speculative tensor preloading by leveraging the register file space saved by the register renaming. As INTERPRET is implemented based on the data access patterns of tensor core operations exhibiting a high level of regularity, the synergistic integration of the register renaming and tensor preloading can significantly improve the processing efficiency. Our experiments show that the proposed design achieves an average speedup of 34.1% and reduces energy consumption by 27.9%.

Original languageEnglish
Title of host publicationProceedings - 2023 32nd International Conference on Parallel Architecture and Compilation Techniques, PACT 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages309-319
Number of pages11
ISBN (Electronic)9798350342543
DOIs
StatePublished - 2023
Event32nd International Conference on Parallel Architecture and Compilation Techniques, PACT 2023 - Vienna, Austria
Duration: 21 Oct 202325 Oct 2023

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)1089-795X

Conference

Conference32nd International Conference on Parallel Architecture and Compilation Techniques, PACT 2023
Country/TerritoryAustria
CityVienna
Period21/10/2325/10/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • GEMM
  • GPU
  • microarchitecture
  • Tensor Core

Fingerprint

Dive into the research topics of 'INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core'. Together they form a unique fingerprint.

Cite this