Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Transformers on Markov Data: Constant Depth Suffices
 
conference paper

Transformers on Markov Data: Constant Depth Suffices

Rajaraman, Nived
•
Bondaschi, Marco  
•
Ramchandran, Kannan
Show more
Globerson, A
•
Mackey, L
Show more
January 1, 2024
Advances In Neural Information Processing Systems 37 (Neurips 2024)
38th Annual Conference on Neural Information Processing Systems

Attention-based transformers have been remarkably successful at modeling generative processes across various domains and modalities. In this paper, we study the behavior of transformers on data drawn from k(th)-order Markov processes, where the conditional distribution of the next symbol in a sequence depends on the previous k symbols observed. We observe a surprising phenomenon empirically which contradicts previous findings: when trained for sufficiently long, a transformer with a fixed depth and 1 head per layer is able to achieve low test loss on sequences drawn from k(th)-order Markov sources, even as k grows. Furthermore, this low test loss is achieved by the transformer's ability to represent and learn the in-context conditional empirical distribution. On the theoretical side, our main result is that a transformer with a single head and three layers can represent the in-context conditional empirical distribution for k(th)-order Markov sources, concurring with our empirical observations. Along the way, we prove that attention-only transformers with O(log(2)(k)) layers can represent the in-context conditional empirical distribution by composing induction heads to track the previous k symbols in the sequence. These results provide more insight into our current understanding of the mechanisms by which transformers learn to capture context, by understanding their behavior on Markov sources. Code is available at: https://github.com/Bond1995/Constant-depth-Transformers.

  • Details
  • Metrics
Type
conference paper
Web of Science ID

WOS:001633253000212

Author(s)
Rajaraman, Nived

University of California System

Bondaschi, Marco  

École Polytechnique Fédérale de Lausanne

Ramchandran, Kannan

University of California System

Gastpar, Michael  

École Polytechnique Fédérale de Lausanne

Makkuva, Ashok Vardhan  

École Polytechnique Fédérale de Lausanne

Editors
Globerson, A
•
Mackey, L
•
Belgrave, D
•
Fan, A
•
Paquet, U
•
Tomczak, J
•
Zhang, C
Date Issued

2024-01-01

Publisher

Neural Information Processing Systems (Nips)

Publisher place

La Jolla

Published in
Advances In Neural Information Processing Systems 37 (Neurips 2024)
ISBN of the book

979-8-3313-1438-5

Series title/Series vol.

Advances in Neural Information Processing Systems; 37

ISSN (of the series)

1049-5258

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LINX  
Event nameEvent acronymEvent placeEvent date
38th Annual Conference on Neural Information Processing Systems

NeurIPS 2024

Vancouver Convention Center

2024-12-10 - 2024-12-15

FunderFunding(s)Grant NumberGrant URL

National Science Foundation (NSF)

CCF-2211209

Swiss National Science Foundation (SNSF)

200364

Available on Infoscience
February 24, 2026
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/260694
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés