TY - JOUR
T1 - Conflict-aware compiler for hierarchical register file on GPUs
AU - Jeong, Eunbi
AU - Park, Eun Seong
AU - Koo, Gunjae
AU - Oh, Yunho
AU - Yoon, Myung Kuk
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/4
Y1 - 2024/4
N2 - Modern graphics processing units (GPUs) leverage a high degree of thread-level parallelism, necessitating large-sized register files for storing numerous thread contexts. To reduce the energy consumption in traditional static random access memory (SRAM)-based register files, recent research has explored non-volatile memory (NVM) for implementing register files. The hierarchical register file (HI-RF) combines SRAM-based register caches with NVM-based register files. In HI-RF, the register cache acts as a write buffer, indexed using both register IDs and warp IDs. HI-RF uses a direct-mapped register cache with two indexing schemes: a concatenating scheme and a thread context-aware scheme. Compiler-assigned register IDs significantly impact cache conflicts, particularly among registers sharing the same LSBs. To address this, we introduce a conflict-aware compiler (CAC) for GPUs equipped with HI-RF. CAC optimizes register assignments based on approximated register write counts. Our evaluation demonstrates that CAC improves performance by 11.1% and 5.9% with the concatenating and thread context-aware index schemes, respectively when compared to a conventional compiler. Simultaneously, it reduces the energy consumption by approximately 73.0 percentage points compared to SRAM for both indexing schemes.
AB - Modern graphics processing units (GPUs) leverage a high degree of thread-level parallelism, necessitating large-sized register files for storing numerous thread contexts. To reduce the energy consumption in traditional static random access memory (SRAM)-based register files, recent research has explored non-volatile memory (NVM) for implementing register files. The hierarchical register file (HI-RF) combines SRAM-based register caches with NVM-based register files. In HI-RF, the register cache acts as a write buffer, indexed using both register IDs and warp IDs. HI-RF uses a direct-mapped register cache with two indexing schemes: a concatenating scheme and a thread context-aware scheme. Compiler-assigned register IDs significantly impact cache conflicts, particularly among registers sharing the same LSBs. To address this, we introduce a conflict-aware compiler (CAC) for GPUs equipped with HI-RF. CAC optimizes register assignments based on approximated register write counts. Our evaluation demonstrates that CAC improves performance by 11.1% and 5.9% with the concatenating and thread context-aware index schemes, respectively when compared to a conventional compiler. Simultaneously, it reduces the energy consumption by approximately 73.0 percentage points compared to SRAM for both indexing schemes.
KW - Compiler optimization
KW - Graphics processing units
KW - Hierarchical register files
KW - Non-volatile memory
UR - http://www.scopus.com/inward/record.url?scp=85187201892&partnerID=8YFLogxK
U2 - 10.1016/j.sysarc.2024.103099
DO - 10.1016/j.sysarc.2024.103099
M3 - Article
AN - SCOPUS:85187201892
SN - 1383-7621
VL - 149
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
M1 - 103099
ER -