AttEntropy: On the Generalization Ability of Supervised Semantic Segmentation Transformers to New Objects in New Domains
In addition to impressive performance, vision transformers have demonstrated remarkable abilities to encode information they were not trained to extract. For example, this information can be used to perform segmentation or single-view depth estimation even though the networks were only trained for image recognition. We show that a similar phenomenon occurs when explicitly training transformers for semantic segmentation in a supervised manner for a set of categories: Once trained, they provide valuable information even about categories absent from the training set. This information can be used to segment objects from these never-seen-before classes in domains as varied as road obstacles, aircraft parked at a terminal, lunar rocks, and maritime hazards.
2-s2.0-105029687549
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
Bergische Universität Wuppertal
Samsung AI Center Toronto
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
2024
Communications in Computer and Information Science; 2375 CCIS
1865-0937
1865-0929
2026-January
1
37
Online Proceedings
REVIEWED
EPFL
| Event name | Event acronym | Event place | Event date |
BMVC 2024 | Glasgow, UK | 2024-11-25 - 2024-11-28 | |