TY - GEN
T1 - Analysis of Thread Block Scheduling Algorithms for General Purpose GPU Systems
AU - Park, Soyeon
AU - Cho, Kyungwoon
AU - Bahn, Hyokyung
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by the ICT R&D program of MSIP/IITP (2018-0-00549, extremely scalable order preserving OS for manycore and non-volatile memory) and (2019-0-00074, developing system software technologies for emerging new memory that adaptively learn workload characteristics). Hyokyung Bahn is the corresponding author of this paper.
Publisher Copyright:
© IEEE 2022.
PY - 2021
Y1 - 2021
N2 - Modern GPGPUs (General-Purpose Graphics Processing Units) have the ability of executing thousands of threads simultaneously. However, the resource utilization of GPGPU in real systems is limited as the load balancing between SMs (Stream Multiprocessors) is difficult during the scheduling of thread blocks, which are the basic units for resource allocation in GPGPU. In order to schedule thread blocks in GPGPU, the current hardware scheduler allocates thread blocks to SMs by the Round-Robin order. Although this is simple and easy to implement, we show that Round-Robin is not efficient when thread blocks of heterogeneous workloads are mixed. In such environments, efficient resource sharing in GPGPU is challenging as workloads have different resource usage patterns, but scheduling should be performed instantly. In this paper, we present a new thread block scheduling algorithm that has the ability of analyzing the load of SMs and the characteristics of pending thread blocks. Specifically, we formulate thread block scheduling as a bin-packing problem, and aim to minimize the internal fragmentation of SMs by arranging size-aware filling of thread blocks to overall SMs in advance. To do so, we make use of multiple queues for incoming thread blocks according to their sizes and perform scheduling by considering the load balancing of SMs. Our experimental results under a wide range of workload conditions show that the proposed algorithm improves the performance of GPGPU by 24.8% on average compared to the Round-Robin scheduler.
AB - Modern GPGPUs (General-Purpose Graphics Processing Units) have the ability of executing thousands of threads simultaneously. However, the resource utilization of GPGPU in real systems is limited as the load balancing between SMs (Stream Multiprocessors) is difficult during the scheduling of thread blocks, which are the basic units for resource allocation in GPGPU. In order to schedule thread blocks in GPGPU, the current hardware scheduler allocates thread blocks to SMs by the Round-Robin order. Although this is simple and easy to implement, we show that Round-Robin is not efficient when thread blocks of heterogeneous workloads are mixed. In such environments, efficient resource sharing in GPGPU is challenging as workloads have different resource usage patterns, but scheduling should be performed instantly. In this paper, we present a new thread block scheduling algorithm that has the ability of analyzing the load of SMs and the characteristics of pending thread blocks. Specifically, we formulate thread block scheduling as a bin-packing problem, and aim to minimize the internal fragmentation of SMs by arranging size-aware filling of thread blocks to overall SMs in advance. To do so, we make use of multiple queues for incoming thread blocks according to their sizes and perform scheduling by considering the load balancing of SMs. Our experimental results under a wide range of workload conditions show that the proposed algorithm improves the performance of GPGPU by 24.8% on average compared to the Round-Robin scheduler.
KW - GPGPU
KW - load balancing
KW - multitasking
KW - resource utilization
KW - thread block scheduler
UR - http://www.scopus.com/inward/record.url?scp=85127890905&partnerID=8YFLogxK
U2 - 10.1109/CSDE53843.2021.9718419
DO - 10.1109/CSDE53843.2021.9718419
M3 - Conference contribution
AN - SCOPUS:85127890905
T3 - 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2021
BT - 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 December 2021 through 10 December 2021
ER -