HyperMixer: An MLP-based Low Cost Alternative to Transformers
Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.
WOS:001190962507024
2023-01-01
Stroudsburg
978-1-959429-72-2
15632
15654
REVIEWED
EPFL
Event name | Event place | Event date |
Toronto, CANADA | JUL 09-14, 2023 | |
Funder | Grant Number |
Swiss National Science Foundation under the project LAOS | 200021_178862 |
Swiss Innovation Agency Innosuisse | 32432.1 IP-ICT |
Swiss National Centre of Competence in Research (NCCR) | 51NF40_180888 |
Show more |