Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. HyperMixer: An MLP-based Low Cost Alternative to Transformers
 
conference paper

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Mai, Florian
•
Pannatier, Arnaud  
•
Fehr, Fabio  
Show more
Rogers, A
•
Boyd-Graber, J
Show more
January 1, 2023
Proceedings Of The 61St Annual Meeting Of The Association For Computational Linguistics (Acl 2023): Long Papers, Vol 1
61st Annual Meeting of the the Association-for-Computational-Linguistics (ACL)

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

  • Details
  • Metrics
Type
conference paper
Web of Science ID

WOS:001190962507024

Author(s)
Mai, Florian
•
Pannatier, Arnaud  
•
Fehr, Fabio  
•
Chen, Haolin  
•
Marelli, Francois
•
Fleuret, Francois  
•
Henderson, James
Editors
Rogers, A
•
Boyd-Graber, J
•
Okazaki, N
Date Issued

2023-01-01

Publisher

Assoc Computational Linguistics-Acl

Publisher place

Stroudsburg

Published in
Proceedings Of The 61St Annual Meeting Of The Association For Computational Linguistics (Acl 2023): Long Papers, Vol 1
ISBN of the book

978-1-959429-72-2

Start page

15632

End page

15654

Subjects

Technology

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent placeEvent date
61st Annual Meeting of the the Association-for-Computational-Linguistics (ACL)

Toronto, CANADA

JUL 09-14, 2023

FunderGrant Number

Swiss National Science Foundation under the project LAOS

200021_178862

Swiss Innovation Agency Innosuisse

32432.1 IP-ICT

Swiss National Centre of Competence in Research (NCCR)

51NF40_180888

Show more
Available on Infoscience
May 1, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/207634
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés