CLIP the Gap: A Single Domain Generalization Approach for Object Detection

Vidit, Vidit; Engilberge, Martin; Salzmann, Mathieu

doi:10.1109/CVPR52729.2023.00314

Vidit, Vidit; Engilberge, Martin; Salzmann, Mathieu

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD [52], on their own diverse weather-driving benchmark.

Details

Title CLIP the Gap: A Single Domain Generalization Approach for Object Detection

Author(s) Vidit, Vidit ; Engilberge, Martin ; Salzmann, Mathieu

Published in 2023 Ieee/Cvf Conference On Computer Vision And Pattern Recognition, Cvpr

Pages 3219-3229

Conference IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUN 17-24, 2023, Vancouver, CANADA

Date 2023-01-01

Publisher Ieee Computer Soc, Los Alamitos

ISSN 1063-6919

ISBN 979-8-3503-0129-8

DOI https://doi.org/10.1109/CVPR52729.2023.00314

Other identifier(s) View record in Web of Science

Laboratories CVLAB

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > CVLAB - Computer Vision Laboratory
Scientific production and competences > Euler Center for Signal Processing
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Grant Swiss National Science Foundation
Swiss Innovation Agency (Innosuisse) via the BRIDGE Discovery grant: 40B2-0 194729

Record creation date 2024-02-16