Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Attention is not all you need: pure attention loses rank doubly exponentially with depth
 
conference paper

Attention is not all you need: pure attention loses rank doubly exponentially with depth

Dong, Yihe
•
Cordonnier, Jean-Baptiste  
•
Loukas, Andreas  
January 1, 2021
International Conference On Machine Learning, Vol 139
International Conference on Machine Learning (ICML)

Attention-based architectures have become ubiquitous in machine learning. Yet, our understanding of the reasons for their effectiveness remains limited. This work proposes a new way to understand self-attention networks: we show that their output can be decomposed into a sum of smaller terms-or paths-each involving the operation of a sequence of attention heads across layers. Using this path decomposition, we prove that self-attention possesses a strong inductive bias towards "token uniformity". Specifically, without skip connections or multi-layer perceptrons (MLPs), the output converges doubly exponentially to a rank-1 matrix. On the other hand, skip connections and MLPs stop the output from degeneration. Our experiments verify the convergence results on standard transformer architectures.

  • Details
  • Metrics
Type
conference paper
Web of Science ID

WOS:000683104602074

Author(s)
Dong, Yihe
Cordonnier, Jean-Baptiste  
Loukas, Andreas  
Date Issued

2021-01-01

Publisher

JMLR-JOURNAL MACHINE LEARNING RESEARCH

Publisher place

San Diego

Published in
International Conference On Machine Learning, Vol 139
Series title/Series vol.

Proceedings of Machine Learning Research

Volume

139

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
MLO  
Event nameEvent placeEvent date
International Conference on Machine Learning (ICML)

ELECTR NETWORK

Jul 18-24, 2021

Available on Infoscience
September 25, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/181645
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés