Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. On the Convergence of Encoder-only Shallow Transformers
 
conference paper not in proceedings

On the Convergence of Encoder-only Shallow Transformers

Wu, Yongtao  
•
Liu, Fanghui  
•
Chrysos, Grigorios  
Show more
2023
37th Annual Conference on Neural Information Processing Systems

In this paper, we aim to build the global convergence theory of encoder-only shallow Transformers under a realistic setting from the perspective of architectures, initialization, and scaling under a finite width regime. The difficulty lies in how to tackle the softmax in self-attention mechanism, the core ingredient of Transformer. In particular, we diagnose the scaling scheme, carefully tackle the input/output of softmax, and prove that quadratic overparameterization is sufficient for global convergence of our shallow Transformers under commonly-used He/LeCun initialization in practice. Besides, neural tangent kernel (NTK) based analysis is also given, which facilitates a comprehensive comparison. Our theory demonstrates the separation on the importance of different scaling schemes and initialization. We believe our results can pave the way for a better understanding of modern Transformers, particularly on training dynamics.

  • Files
  • Details
  • Metrics
Type
conference paper not in proceedings
ArXiv ID

https://arxiv.org/abs/2311.01575

Author(s)
Wu, Yongtao  
Liu, Fanghui  
Chrysos, Grigorios  
Cevher, Volkan  orcid-logo
Date Issued

2023

Total of pages

41

Subjects

AI-ML

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIONS  
Event nameEvent placeEvent date
37th Annual Conference on Neural Information Processing Systems

New Orleans, USA

December 10-16. 2023

Available on Infoscience
March 14, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/206105
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés