Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Virtual memory, with the support of address translation hardware, is a key technique in expanding programmability and memory management in GPUs. However, the nature of the GPU execution model heavily pressures its translation hardware, particularly due to a discrepancy in the behavior of page table walkers and thousands of concurrently running threads. In GPU workloads, multiple threads simultaneously access a number of pages necessitating a substantial number of translations whereas each walker handles only a single walk request at a time. Such a limitation significantly increases the queueing latency of walk requests, which we observe as a major bottleneck for servicing page table walks in GPUs. To tackle this challenge, we investigate a design of page walkers that facilitates multiple walk requests to be handled together in batches. Then, we make the following observations: 1) allowing a page walker to issue beyond a single memory request significantly improves the throughput of walkers, and 2) GPU applications tend to concurrently access pages in wide address ranges. By leveraging the above implications, we propose Marching Page Walks (MPW) that effectively mitigate the contention in GPU page table walkers. MPW scans pending walk requests to identify ones that can be grouped together. Then, MPW batches these requests and concurrently handles them by issuing multiple memory instructions. Experiments show that MPW reduces the queueing latency of page walks by 86.7% and improves GPU performance by 55.6% over the baseline design.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Symposium on High Performance Computer Architecture, HPCA 2025
PublisherIEEE Computer Society
Pages1662-1677
Number of pages16
ISBN (Electronic)9798331506476
DOIs
StatePublished - 2025
Event31st IEEE International Symposium on High Performance Computer Architecture, HPCA 2025 - Las Vegas, United States
Duration: 1 Mar 20255 Mar 2025

Publication series

NameProceedings - International Symposium on High-Performance Computer Architecture
ISSN (Print)1530-0897

Conference

Conference31st IEEE International Symposium on High Performance Computer Architecture, HPCA 2025
Country/TerritoryUnited States
CityLas Vegas
Period1/03/255/03/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • Address Translation
  • Graphics Processing Unit
  • Page Table Walker
  • Virtual Memory

Fingerprint

Dive into the research topics of 'Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput'. Together they form a unique fingerprint.

Cite this