Performance analysis of thread block schedulers in GPGPU and its implications

Kyungwoon Cho, Hyokyung Bahn

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

GPGPU (General-Purpose Graphics Processing Unit) consists of hardware resources that can execute tens of thousands of threads simultaneously. However, in reality, the parallelism is limited as resource allocation is performed by the base unit called thread block, which is not managed judiciously in the current GPGPU systems. To schedule threads in GPGPU, a specialized hardware scheduler allocates thread blocks to the computing unit called SM (Stream Multiprocessors) in a Round-Robin manner. Although scheduling in hardware is simple and fast, we observe that the Round-Robin scheduling is not efficient in GPGPU, as it does not consider the workload characteristics of threads and the resource balance among SMs. In this article, we present a new thread block scheduling model that has the ability of analyzing and quantifying the performances of thread block scheduling. We implement our model as a GPGPU scheduling simulator and show that the conventional thread block scheduling provided in GPGPU hardware does not perform well as the workload becomes heavy. Specifically, we observe that the performance degradation of Round-Robin can be eliminated by adopting DFA (Depth First Allocation), which is simple but scalable. Moreover, as our simulator consists of modular forms based on the framework and we publicly open it for other researchers to use, various scheduling policies can be incorporated into our simulator for evaluating the performance of GPGPU schedulers.

Original languageEnglish
Article number9121
Pages (from-to)1-9
Number of pages9
JournalApplied Sciences (Switzerland)
Volume10
Issue number24
DOIs
StatePublished - 2 Dec 2020

Bibliographical note

Funding Information:
Funding: This work was supported by the ICT R&D program of MSIP/IITP (2018-0-00549, Extremely scalable order preserving OS for manycore and non-volatile memory).

Funding Information:
This work was supported by the ICT R&D program of MSIP/IITP (2018-0-00549, Extremely scalable order preserving OS for manycore and non-volatile memory).

Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.

Keywords

  • GPGPU
  • Round-Robin
  • Thread block
  • Thread block scheduling

Fingerprint

Dive into the research topics of 'Performance analysis of thread block schedulers in GPGPU and its implications'. Together they form a unique fingerprint.

Cite this