Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds
 
conference paper

Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds

Troiani, Emanuele  
•
Cui, Hugo Chao  
•
Dandi, Yatin  
Show more
May 1, 2025
Proceedings of the 42nd International Conference on Machine Learning
42nd International Conference on Machine Learning, ICML 2025

In this manuscript, we study the learning of deep attention neural networks, defined as the composition of multiple self-attention layers, with tied and low-rank weights. We first establish a mapping of such models to sequence multi-index models, a generalization of the widely studied multi-index model to sequential covariates, for which we establish a number of general results. In the context of Bayes-optimal learning, in the limit of large dimension D and proportionally large number of samples N , we derive a sharp asymptotic characterization of the optimal performance as well as the performance of the best-known polynomialtime algorithm for this setting-namely approximate message-passing-, and characterize sharp thresholds on the minimal sample complexity required for better-than-random prediction performance. Our analysis uncovers, in particular, how the different layers are learned sequentially. Finally, we discuss how this sequential learning can also be observed in a realistic setup.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

6949_Fundamental_limits_of_lea.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

N/A

Size

890.84 KB

Format

Adobe PDF

Checksum (MD5)

bcfcfec874f68c503b45de0493f03282

Loading...
Thumbnail Image
Name

2502.00901v1.pdf

Type

Main Document

Version

Submitted version (Preprint)

Access type

openaccess

License Condition

N/A

Size

1.21 MB

Format

Adobe PDF

Checksum (MD5)

ef3c01f45fe83ebc5365aa8e7942f94d

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés