Managing GPGPU resources in cloud systems is challenging as workloads with various resource usage patterns coexist. To determine the co-location of workloads, previous studies have shown that run-time performance profiling and dynamic relocation of workloads is necessary due to interference between workloads. However, this makes instant scheduling difficult and also affects the performance of workload executions. In this article, we show that efficient resource sharing in GPGPU is possible without run-time profiling if resource usage characteristics of workloads are analyzed down to a fine-grained unit level. To extract workload characteristics, we do not perform profiling at scheduling time, but separate profiling from scheduling, thereby reducing the run-time complexity of previous approaches. Specifically, we anatomize the characteristics of various GPGPU workloads and present a new scheduling policy that aims at balancing resource utilization by co-locating workloads with complementary resource demands. Simulation experiments under various virtual machine scenarios show that the proposed policy improves the GPGPU throughput by 119.5% on average and up to 191.7%.
Bibliographical notePublisher Copyright:
© 2021 IEEE.
- cloud system
- resource utilization
- thread block scheduler