Side-tuning: a Baseline for Network Adaptation Via Additive Side Networks
When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights. Adaptation can be useful in cases when training data is scarce, when a single learner needs to perform multiple tasks, or when one wishes to encode priors in the network. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative: side-tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network via summation. This simple method works as well as or better than existing solutions and it resolves some of the basic issues with fine-tuning, fixed features, and other common approaches. In particular, side-tuning is less prone to overfitting, is asymptotically consistent, and does not suffer from catastrophic forgetting in incremental learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including incremental learning (iCIFAR, iTaskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.
WOS:001500572400041
University of California System
University of California System
École Polytechnique Fédérale de Lausanne
Stanford University
University of California System
2020-12-03
Cham
978-3-030-58579-2
978-3-030-58580-8
Part III
Lecture Notes in Computer Science; 12348
0302-9743
698
714
REVIEWED
EPFL
Event name | Event acronym | Event place | Event date |
ECCV 2020 | Glasgow, UK | 2020-08-23 - 2020-08-28 | |
Funder | Funding(s) | Grant Number | Grant URL |
MURI | N00014-14-1-0671 | ||
Vannevar Bush Faculty Fellowship | |||
Amazon AWS Machine Learning Award | |||
Show more |