Abstract
Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior computation throughput for general matrix-matrix multiplication (GEMM) that has been widely used for deep learning applications. For massive-scale GEMMs, the entire matrix is practically divided into sub-matrices and assigned to multiple thread blocks and warps, and then processed by the tensor cores. Meanwhile, the same sub-matrix is regularly reused as an input to different sub-GEMMs, which causes redundant load operations from different warps and waste of register file spaces. To tackle this issue, we propose INTERPRET, a novel tensor core microarchitecture designed to minimize unnecessary accesses to the cache/memory hierarchy by leveraging the inter-warp data reuse characteristics. INTERPRET adopts a register renaming scheme to reduce the redundant load requests as well as the waste of register files, resulting in the reduction of the effective data load latency. INTERPRET further improves performance via non-speculative tensor preloading by leveraging the register file space saved by the register renaming. As INTERPRET is implemented based on the data access patterns of tensor core operations exhibiting a high level of regularity, the synergistic integration of the register renaming and tensor preloading can significantly improve the processing efficiency. Our experiments show that the proposed design achieves an average speedup of 34.1% and reduces energy consumption by 27.9%.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 32nd International Conference on Parallel Architecture and Compilation Techniques, PACT 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 309-319 |
Number of pages | 11 |
ISBN (Electronic) | 9798350342543 |
DOIs | |
State | Published - 2023 |
Event | 32nd International Conference on Parallel Architecture and Compilation Techniques, PACT 2023 - Vienna, Austria Duration: 21 Oct 2023 → 25 Oct 2023 |
Publication series
Name | Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT |
---|---|
ISSN (Print) | 1089-795X |
Conference
Conference | 32nd International Conference on Parallel Architecture and Compilation Techniques, PACT 2023 |
---|---|
Country/Territory | Austria |
City | Vienna |
Period | 21/10/23 → 25/10/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- GEMM
- GPU
- microarchitecture
- Tensor Core