Abstract
Retrieval-Augmented Generation (RAG) enhances language models by incorporating external knowledge at inference time. However, its performance is highly sensitive to the quality of retrieved content, which often includes noisy or irrelevant distractors. Conventional latent representations for clustering-based retrieval are often poorly structured and misaligned with generation objectives. We propose DS-CAE, a Dual-Stream Cross-Attentive Autoencoder that jointly encodes global and local semantics via Transformer and BiLSTM encoders. We fuse them with bidirectional cross-Attention and token-wise gating for context-Aware integration. We improve retrieval with a composite loss (reconstruction + adaptive-margin triplet) and GMM filtering based on reconstruction loss and cluster distance. Experiments on Natural Questions, WebQuestions, and CuratedTREC show that DS-CAE outperforms previous RAG models on CuratedTREC by up to 6.8% and performs competitively on NQ with a compact 171M-parameter model. We validate each component's impact on cluster-Aware retrieval through ablations.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2025 IEEE International Conference on Communications, Computing, Cybersecurity and Informatics, CCCI 2025 |
| Editors | Mohammad S. Obaidat, Lin Zhang, Petros Nicopolitidis, Yu Guo, Xinyu Zhang |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9798331501969 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE International Conference on Communications, Computing, Cybersecurity and Informatics, CCCI 2025 - Hangzhou, China Duration: 15 Oct 2025 → 17 Oct 2025 |
Publication series
| Name | Proceedings of the 2025 IEEE International Conference on Communications, Computing, Cybersecurity and Informatics, CCCI 2025 |
|---|
Conference
| Conference | 2025 IEEE International Conference on Communications, Computing, Cybersecurity and Informatics, CCCI 2025 |
|---|---|
| Country/Territory | China |
| City | Hangzhou |
| Period | 15/10/25 → 17/10/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- Artificial Intelligence
- Cluster-Aware Retrieval
- Latent Representation
- Optimization
- Retrieval-Augmented Generation
Fingerprint
Dive into the research topics of 'DS-CAE: A Dual-Stream Cross-Attentive Autoencoder for Robust and Cluster-Aware Retrieval-Augmented Generation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver