Abstract
This paper proposes a vector similarity search acceleration by leveraging DRAM-based Processing In Memory (PIM), which is a key component in Retrieval-Augmented Generation (RAG) used to address limitations in large language models (LLM). As datasets expand, distance computations in vector similarity searches become increasingly memory-intensive. To tackle this challenge, we developed vector similarity search applications using both brute-force and Hierarchical Navigable Small World (HNSW) algorithms, with the distance computation process accelerated through PIM. The proposed PIM implementation was emulated on an FPGA board, when verification and testing demonstrated significant performance gains. These findings highlight the promising potential for PIM commercialization and its capability to enhance LLM performance.
Original language | English |
---|---|
Title of host publication | 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798331510756 |
DOIs | |
State | Published - 2025 |
Event | 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025 - Osaka, Japan Duration: 19 Jan 2025 → 22 Jan 2025 |
Publication series
Name | 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025 |
---|
Conference
Conference | 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025 |
---|---|
Country/Territory | Japan |
City | Osaka |
Period | 19/01/25 → 22/01/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- Processing-In-Memory (PIM)
- Retrieval-Augmented Generation (RAG)
- in-memory computing
- large language models (LLMs)
- vector similarity search