Triple-A: Early Operand Collector Allocation for Maximizing GPU Register Bank Utilization

Ipoom Jeong, Eunbi Jeong, Nam Sung Kim, Myung Kuk Yoon

Research output: Contribution to journalArticlepeer-review

Abstract

Recent GPUs provisioned with large register files (RFs) cannot fully utilize the bandwidth between the RFs and execution pipelines, as the current policy for allocating operand (OP) collectors defers the RF accesses until all the source OPs become ready. To tackle this issue, this letter introduces a new OP collector allocation mechanism called Triple-A. Triple-A comprises four key operations. First, Triple-A proactively allocates an OP collector (OC) to a warp instruction even if one of its source OPs is not yet ready, taking advantage of GPUs' in-order execution. Second, a computation result can be directly forwarded to an early allocated OC along with a data dependence, reducing OP loading time from the RFs. Third, Triple-A bypasses RF write operations if the forwarded data is not consumed by any other instruction. Finally, the early allocation is further enhanced with latency-aware optimization, alleviating the potential performance degradation caused by allocating OCs aggressively. Together, these techniques synergistically improve the register bank utilization, demonstrating a 14.1% improvement in performance and an 11.8% reduction in RF energy consumption compared to the state-of-the-art GPUs.

Original languageEnglish
Pages (from-to)206-209
Number of pages4
JournalIEEE Embedded Systems Letters
Volume16
Issue number2
DOIs
StatePublished - 1 Jun 2024

Bibliographical note

Publisher Copyright:
© 2009-2012 IEEE.

Keywords

  • Data forwarding
  • graphics processing units (GPUs)
  • operand collector (OC)
  • register files (RFs)

Fingerprint

Dive into the research topics of 'Triple-A: Early Operand Collector Allocation for Maximizing GPU Register Bank Utilization'. Together they form a unique fingerprint.

Cite this