Abstract
In industrial applications, pre-trained models fine-tuned on custom datasets often exceed the specific requirements of their target domains due to their oversized and resource-intensive nature. To address these inefficiencies, this study introduces ViT-Slim, a Genetic Algorithm-based Neural Architecture Search (GA-NAS) framework designed for optimizing Vision Transformer (ViT) architectures. ViT-Slim balances model size and performance by leveraging the strengths of genetic algorithms to explore and optimize design configurations. The framework integrates Dense Relative Localization (DrLoc) to inject inductive bias, enhancing ViT's ability to process custom datasets efficiently, even with limited data. Experimental validation on a custom depth dataset demon-strates that ViT-Slim achieves significant resource efficiency, reducing memory usage by up to 79.18% and parameters by 83.09%, with less than 1% accuracy loss compared to the baseline ViT-Small.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE Conference on Artificial Intelligence, CAI 2025 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 796-802 |
| Number of pages | 7 |
| ISBN (Electronic) | 9798331524005 |
| DOIs | |
| State | Published - 2025 |
| Event | 3rd IEEE Conference on Artificial Intelligence, CAI 2025 - Santa Clara, United States Duration: 5 May 2025 → 7 May 2025 |
Publication series
| Name | Proceedings - 2025 IEEE Conference on Artificial Intelligence, CAI 2025 |
|---|
Conference
| Conference | 3rd IEEE Conference on Artificial Intelligence, CAI 2025 |
|---|---|
| Country/Territory | United States |
| City | Santa Clara |
| Period | 5/05/25 → 7/05/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- Dense Relative Localization (DrLoc)
- Genetic Algorithm-based Neural Architecture Search (GA-NAS)
- Lightweight Vision Transformers
- Multi-Objective Model Optimization
- Resource-Efficient AI Models
- Vision Transformer (ViT)