On the Relationship between Self-Attention and Convolutional Layers

Cordonnier, Jean-Baptiste; Loukas, Andreas; Jaggi, Martin

Cordonnier, Jean-Baptiste; Loukas, Andreas; Jaggi, Martin

2020

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Recent trends of incorporating attention mechanisms in vision have led re- searchers to reconsider the supremacy of convolutional layers as a primary build- ing block. Beyond helping CNNs to handle long-range dependencies, Ramachandran et al. (2019) showed that attention can completely replace convolution and achieve state-of-the-art performance on vision tasks. This raises the question: do learned attention layers operate similarly to convolutional layers? This work pro- vides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers, corroborating our analysis. Our code is publicly available.

Details

Title On the Relationship between Self-Attention and Convolutional Layers

Author(s) Cordonnier, Jean-Baptiste ; Loukas, Andreas ; Jaggi, Martin

Pagination 18

Conference Eighth International Conference on Learning Representations - ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020

Date 2020

Keywords

self-attention; transformers; convolution; expressivity; ml-ai

Additional link Code; Interactive website; OpenReview

Laboratories MLO
LTS2

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > MLO - Machine Learning and Optimization Laboratory
Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LTS2 - Signal Processing Laboratory 2
Peer-reviewed publications
Conference Papers
Work produced at EPFL

Record creation date 2020-01-10

Files

Abstract

Details

PDF