This paper conducts a detailed study of the factors affecting the operation stalls in terms of the fetch group size on the warp scheduler of GPUs. Throughout this paper, we reveal that the size of a fetch group is highly involved for hiding various types of operation stalls: Short latency stalls, long latency stalls, and Load/Store Unit (LSU) stalls. The scheduler with a small fetch group cannot hide short latency stalls due to the limited number of warps in a fetch group. In contrast, the scheduler with a large fetch group cannot hide long latency and LSU stalls due to the limited number of fetch groups and the lack of memory subsystems, respectively. To hide various types of stalls, this paper proposes a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of a fetch group dynamically based on the execution phases of applications. For the applications that have the best performance at LRR (one fetch group), the DRAW scheduler matches the performance of LRR and outperforms TL (multiple fetch groups) by 22.7 percent. In addition, for the applications that have the best performance at TL, our scheduler achieves 11.0 and 5.5 percent better performance compared to LRR and TL, respectively.
|Number of pages||15|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|State||Published - 1 Nov 2017|
- General Purpose on GPUs (GPGPUs)
- Graphics Processing Units (GPUs)
- warp scheduler