DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism

  • Cheonjun Park
  • , Mincheol Park
  • , Hyunchan Moon
  • , Myung Kuk Yoon
  • , Seokjin Go
  • , Suhyun Kim
  • , Won Woo Ro

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

Depth-wise Separable Convolution (DSConv) has a powerful representation even with fewer parameters and computation, leading to its adoption by almost all of the state-of-the-art CNN models. DSConv models are already compact making it hard to apply pruning, and there are few previous pruning techniques that target depth-wise convolution (DW-conv). In this paper, we present Depth-wise Separable Convolution Pruning (DEPrune), a novel pruning method applied to both point-wise and depth-wise convolutions. DEPrune is optimized by analyzing the computation of DSConv on GPUs. DEPrune employs a fine-grained pruning approach, yet it achieves the structured sparsity typically absent in fine-grained pruning, enabling practical hardware acceleration. Moreover, this method maintains a high pruning ratio without causing any accuracy drop. We additionally represent techniques that further enhance DEPrune performance: 1) balanced workload tuning (BWT), and 2) hardware-aware sparsity recalibration (HSR). Experiment results show that DEPrune achieves up to 3.74× practical speedup in DSConv inference on GPUs while maintaining the accuracy of EfficientNet-B0 on ImageNet.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

Bibliographical note

Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism'. Together they form a unique fingerprint.

Cite this