Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-Dimensional Tokens
 
research article

Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-Dimensional Tokens

Erba, Vittorio  
•
Troiani, Emanuele  
•
Biggio, Luca  
Show more
June 16, 2025
Physical Review X (PRX)

Current progress in artificial intelligence is centered around so-called large language models that consist of neural networks processing long sequences of high-dimensional vectors called tokens. Statistical physics provides powerful tools to study the functioning of learning with neural networks and has played a recognized role in the development of modern machine learning. The statistical physics approach relies on simplified and analytically tractable models of data. However, simple tractable models for long sequences of high-dimensional tokens are largely underexplored. Inspired by the crucial role models such as the single-layer teacher-student perceptron (also known as generalized linear regression) played in the theory of fully connected neural networks, in this paper, we introduce and study the (BSR) as one of the most basic models for sequences of tokens. We note that modern architectures naturally subsume the BSR model due to the skip connections. Building on recent methodological progress, we compute the Bayes-optimal generalization error for the model in the limit of long sequences of high-dimensional tokens and provide a message-passing algorithm that matches this performance. We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression. We also unveil surprising properties of the gradient descent algorithms in the BSR model. Published by the American Physical Society 2025

  • Details
  • Metrics
Type
research article
DOI
10.1103/l4p2-vrxt
Author(s)
Erba, Vittorio  

École Polytechnique Fédérale de Lausanne

Troiani, Emanuele  

École Polytechnique Fédérale de Lausanne

Biggio, Luca  

École Polytechnique Fédérale de Lausanne

Maillard, Antoine
Zdeborová, Lenka  

École Polytechnique Fédérale de Lausanne

Date Issued

2025-06-16

Publisher

American Physical Society

Published in
Physical Review X (PRX)
Volume

15

Issue

2

Article Number

021092

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
SPOC1  
FunderFunding(s)Grant NumberGrant URL

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

212049,TMPFP2-210012

Available on Infoscience
June 27, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/251673
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés