Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Universal Prediction in the Age of Large Language Models
 
doctoral thesis

Universal Prediction in the Age of Large Language Models

Bondaschi, Marco  
2026

Artificial intelligence has been reshaped in recent years by the meteoric rise of large language models (LLMs). These models, which work by generating text in response to user prompts, have proved extraordinary across a wide range of tasks, from information extraction to text summarization and creative writing, from coding to reasoning and problem solving. These exceptional empirical results call for a deeper understanding of how these models are able to perform such complex tasks so well. This is crucial in order to identify their limits and potential ways for further improvement. In this thesis, we contribute to this research area by viewing LLMs such as transformers and state-space models (SSMs) through the lens of information theory and in particular of universal prediction. Universal prediction studies the theoretical limits of predicting sequences of tokens when their probability distribution is uncertain. LLMs are particularly suitable for this kind of study because they are, in fact, predictors, trained to accurately estimate the next word in a text sequentially. Unfortunately, prior work in universal prediction fails to accommodate the idea of models like LLMs, that are first trained on a large corpus of data, and then tested on unseen samples. In this work we aim at bridging this gap by contributing to both information theory and LLM research. On the information theory side, we introduce batch universal prediction as a framework to study models that are trained on batches of data. We derive fundamental learning limits and discuss the optimality of certain predictors. Then, we apply this framework to LLMs by interpreting them as universal predictors on Markov data. This type of data structure is particularly appropriate for two reasons: (i) it is a good approximation for natural language; and (ii) its optimal predictor can be derived theoretically. We mainly focus on two instances of this framework: in-distribution learning and in-context learning. In the in-distribution case, LLMs are trained on data entirely drawn from the same Markov distribution. Here models can learn the ground-truth distribution during training and directly use it to predict test sequences at inference. Surprisingly, we discover that single-layer transformers struggle to learn this apparently simple task. We characterize their loss landscape and learning dynamics and we identify weight initialization and weight tying as the causes of this pitfall. In the in-context case, each training sequence is generated by a randomly sampled Markov distribution. Here models can only learn the general structure of the data during training, while the actual distribution of the test data has to be estimated in-context at inference. We compare transformers and state-space models, identifying the strategies each of them uses to implement the optimal predictor, and we discuss the minimal requirements for them to succeed, in terms of depth and hidden dimensions. We conclude the thesis by proposing an alternative prediction measure that provides better theoretical guarantees for individual sequences than the classical average cross-entropy loss. Overall, we hope that our theoretically solid framework can help to demystify complex LLM architectures and provide insights for further advancements in terms of architecture design and efficiency.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH11087.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

N/A

Size

7.8 MB

Format

Adobe PDF

Checksum (MD5)

9470ff84267c0fd8cc040b4a854d158e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés