Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Leveraging Approximate Caching for Faster Retrieval-Augmented Generation
 
conference paper

Leveraging Approximate Caching for Faster Retrieval-Augmented Generation

Bergman, Shai
•
Ji, Zhang
•
Kermarrec, Anne-Marie  
Show more
March 31, 2025
5th Workshop on Machine Learning and Systems (EuroMLSys)

Retrieval-augmented generation (RAG) enhances the reliability of large language model (LLM) answers by integrating external knowledge. However, RAG increases the end-toend inference time since looking for relevant documents from large vector databases is computationally expensive. To address this, we introduce Proximity, an approximate key-value cache that optimizes the RAG workflow by leveraging similarities in user queries. Instead of treating each query independently, Proximity reuses previously retrieved documents when similar queries appear, reducing reliance on expensive vector database lookups. We evaluate Proximity on the MMLU and MedRAG benchmarks, demonstrating that it significantly improves retrieval efficiency while maintaining response accuracy. Proximity reduces retrieval latency by up to 59% while maintaining accuracy and lowers the computational burden on the vector database. We also experiment with different similarity thresholds and quantify the trade-off between speed and recall. Our work shows that approximate caching is a viable and effective strategy for optimizing RAG-based systems.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

euromlsys25-final13.pdf

Type

Main Document

Version

http://purl.org/coar/version/c_970fb48d4fbd8a85

Access type

openaccess

License Condition

N/A

Size

658.87 KB

Format

Adobe PDF

Checksum (MD5)

0bfb86258ae01551b79a4f58149659f3

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés