TY - JOUR
T1 - Dynamic Resizing on Active Warps Scheduler to Hide Operation Stalls on GPUs
AU - Yoon, Myung Kuk
AU - Oh, Yunho
AU - Kim, Seung Hun
AU - Lee, Sangpil
AU - Kim, Deokho
AU - Ro, Won Woo
N1 - Funding Information:
This work is supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2015R1A2A2A01008281). This paper is an extension of our previous study, “DRAW: Investigating benefits of adaptive fetch group size on GPU,” which appeared in the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2015). W. W. Ro is the corresponding author.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - This paper conducts a detailed study of the factors affecting the operation stalls in terms of the fetch group size on the warp scheduler of GPUs. Throughout this paper, we reveal that the size of a fetch group is highly involved for hiding various types of operation stalls: Short latency stalls, long latency stalls, and Load/Store Unit (LSU) stalls. The scheduler with a small fetch group cannot hide short latency stalls due to the limited number of warps in a fetch group. In contrast, the scheduler with a large fetch group cannot hide long latency and LSU stalls due to the limited number of fetch groups and the lack of memory subsystems, respectively. To hide various types of stalls, this paper proposes a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of a fetch group dynamically based on the execution phases of applications. For the applications that have the best performance at LRR (one fetch group), the DRAW scheduler matches the performance of LRR and outperforms TL (multiple fetch groups) by 22.7 percent. In addition, for the applications that have the best performance at TL, our scheduler achieves 11.0 and 5.5 percent better performance compared to LRR and TL, respectively.
AB - This paper conducts a detailed study of the factors affecting the operation stalls in terms of the fetch group size on the warp scheduler of GPUs. Throughout this paper, we reveal that the size of a fetch group is highly involved for hiding various types of operation stalls: Short latency stalls, long latency stalls, and Load/Store Unit (LSU) stalls. The scheduler with a small fetch group cannot hide short latency stalls due to the limited number of warps in a fetch group. In contrast, the scheduler with a large fetch group cannot hide long latency and LSU stalls due to the limited number of fetch groups and the lack of memory subsystems, respectively. To hide various types of stalls, this paper proposes a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of a fetch group dynamically based on the execution phases of applications. For the applications that have the best performance at LRR (one fetch group), the DRAW scheduler matches the performance of LRR and outperforms TL (multiple fetch groups) by 22.7 percent. In addition, for the applications that have the best performance at TL, our scheduler achieves 11.0 and 5.5 percent better performance compared to LRR and TL, respectively.
KW - General Purpose on GPUs (GPGPUs)
KW - Graphics Processing Units (GPUs)
KW - warp scheduler
UR - http://www.scopus.com/inward/record.url?scp=85032452323&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2017.2704080
DO - 10.1109/TPDS.2017.2704080
M3 - Article
AN - SCOPUS:85032452323
SN - 1045-9219
VL - 28
SP - 3142
EP - 3156
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 11
M1 - 7927466
ER -