Fundamental Limits of Prompt Compression: A Rate- Distortion Framework for Black-Box Language Models

Girish, Adway; Nagle, Alliot; Bondaschi, Marco; Gastpar, Michael; Makkuva, Ashok Vardhan; Kim, Hyeji

conference poster

Girish, Adway

•

Nagle, Alliot

•

Bondaschi, Marco

September 25, 2024

Advances in Neural Information Processing Systems 37 (NeurIPS 2024)

38th Annual Conference on Neural Information Processing Systems

We formalize the problem of prompt compression for large language models (LLMs) and present a framework to unify token-level prompt compression methods which create hard prompts for black-box models. We derive the distortion-rate function for this setup as a linear program, and provide an efficient algorithm to compute this fundamental limit via the dual of the linear program. Using the distortion-rate function as the baseline, we study the performance of existing compression schemes on a synthetic dataset consisting of prompts generated from a Markov chain, natural language queries, and their respective answers. Our empirical analysis demonstrates the criticality of query-aware prompt compression, where the compressor has knowledge of the downstream task/query for the black-box LLM. We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy, and propose Adaptive QuerySelect, a query-aware, variable-rate adaptation of a prior work to close the gap. We extend our experiments to a small natural language dataset to further confirm our findings on our synthetic dataset.

Type

conference poster

ArXiv ID

2407.15504

Author(s)

Girish, Adway

EPFL

Nagle, Alliot

The University of Texas at Austin

Bondaschi, Marco

EPFL

Gastpar, Michael

EPFL

Makkuva, Ashok Vardhan

EPFL

Kim, Hyeji

The University of Texas at Austin

Date Issued

2024-09-25

Publisher

Curran Associates, Inc.

Published in

Advances in Neural Information Processing Systems 37 (NeurIPS 2024)

ISBN of the book

9798331314385

Subjects

Computer Science - Learning

•

Computer Science - Computation and Language

•

Computer Science - Information Theory

•

Mathematics - Information Theory

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LTHI

LINX

Event name	Event acronym	Event place	Event date
38th Annual Conference on Neural Information Processing Systems	NeurIPS	Vancouver Convention Center, Canada	2024-12-10 - 2024-12-15

Available on Infoscience

April 4, 2025

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/248663