ViT-Slim: A Genetic Algorithm-based NAS Framework for Efficient Vision Transformer Design

Eunjoung Yoo, Jaehyeong Sim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In industrial applications, pre-trained models fine-tuned on custom datasets often exceed the specific requirements of their target domains due to their oversized and resource-intensive nature. To address these inefficiencies, this study introduces ViT-Slim, a Genetic Algorithm-based Neural Architecture Search (GA-NAS) framework designed for optimizing Vision Transformer (ViT) architectures. ViT-Slim balances model size and performance by leveraging the strengths of genetic algorithms to explore and optimize design configurations. The framework integrates Dense Relative Localization (DrLoc) to inject inductive bias, enhancing ViT's ability to process custom datasets efficiently, even with limited data. Experimental validation on a custom depth dataset demon-strates that ViT-Slim achieves significant resource efficiency, reducing memory usage by up to 79.18% and parameters by 83.09%, with less than 1% accuracy loss compared to the baseline ViT-Small.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE Conference on Artificial Intelligence, CAI 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages796-802
Number of pages7
ISBN (Electronic)9798331524005
DOIs
StatePublished - 2025
Event3rd IEEE Conference on Artificial Intelligence, CAI 2025 - Santa Clara, United States
Duration: 5 May 20257 May 2025

Publication series

NameProceedings - 2025 IEEE Conference on Artificial Intelligence, CAI 2025

Conference

Conference3rd IEEE Conference on Artificial Intelligence, CAI 2025
Country/TerritoryUnited States
CitySanta Clara
Period5/05/257/05/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • Dense Relative Localization (DrLoc)
  • Genetic Algorithm-based Neural Architecture Search (GA-NAS)
  • Lightweight Vision Transformers
  • Multi-Objective Model Optimization
  • Resource-Efficient AI Models
  • Vision Transformer (ViT)

Fingerprint

Dive into the research topics of 'ViT-Slim: A Genetic Algorithm-based NAS Framework for Efficient Vision Transformer Design'. Together they form a unique fingerprint.

Cite this