Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. SINDER: Repairing the Singular Defects of DINOv2
 
conference paper

SINDER: Repairing the Singular Defects of DINOv2

Wang, Haoqi  
•
Zhang, Tong
•
Salzmann, Mathieu  
Leonardis, A
•
Ricci, E
Show more
January 1, 2025
Computer Vision-Eccv 2024, Pt Vii
18th European Conference on Computer Vision (ECCV)

Vision Transformer models trained on large-scale datasets, although effective, often exhibit artifacts in the patch token they extract. While such defects can be alleviated by re-training the entire model with additional classification tokens, the underlying reasons for the presence of these tokens remain unclear. In this paper, we conduct a thorough investigation of this phenomenon, combining theoretical analysis with empirical observations. Our findings reveal that these artifacts originate from the pre-trained network itself, specifically stemming from the leading left singular vector of the network's weights. Furthermore, to mitigate these defects, we propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset, thereby avoiding the need for complete re-training. We validate our method on various downstream tasks, including unsupervised segmentation, classification, supervised segmentation, and depth estimation, demonstrating its effectiveness in improving model performance. Codes and checkpoints are available at https://github.com/haoqiwang/sinder.

  • Details
  • Metrics
Type
conference paper
DOI
10.1007/978-3-031-72667-5_2
Web of Science ID

WOS:001346380800002

Author(s)
Wang, Haoqi  

École Polytechnique Fédérale de Lausanne

Zhang, Tong

École Polytechnique Fédérale de Lausanne

Salzmann, Mathieu  

École Polytechnique Fédérale de Lausanne

Editors
Leonardis, A
•
Ricci, E
•
Roth, S
•
Russakovsky, O
•
Sattler, T
•
Varol, G
Date Issued

2025-01-01

Publisher

Springer Nature

Publisher place

CHAM

Published in
Computer Vision-Eccv 2024, Pt Vii
ISBN of the book

978-3-031-72666-8

978-3-031-72667-5

Series title/Series vol.

Lecture Notes in Computer Science; 15065

ISSN (of the series)

0302-9743

1611-3349

Start page

20

End page

35

Subjects

DINOv2

•

Singular Defect

•

Unsupervised Segmentation

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
CVLAB  
SDSC-GE  
Event nameEvent acronymEvent placeEvent date
18th European Conference on Computer Vision (ECCV)

Milan, ITALY

2024-09-29 - 2024-10-04

FunderFunding(s)Grant NumberGrant URL

Swiss National Science Foundation (SNSF)

CRSII5-180359

Available on Infoscience
January 31, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/246140
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés