SINDER: Repairing the Singular Defects of DINOv2

Wang, Haoqi; Zhang, Tong; Salzmann, Mathieu

doi:10.1007/978-3-031-72667-5_2

conference paper

SINDER: Repairing the Singular Defects of DINOv2

Wang, Haoqi

•

Zhang, Tong

•

Salzmann, Mathieu

Leonardis, A

•

Ricci, E

January 1, 2025

Computer Vision-Eccv 2024, Pt Vii

18th European Conference on Computer Vision

Vision Transformer models trained on large-scale datasets, although effective, often exhibit artifacts in the patch token they extract. While such defects can be alleviated by re-training the entire model with additional classification tokens, the underlying reasons for the presence of these tokens remain unclear. In this paper, we conduct a thorough investigation of this phenomenon, combining theoretical analysis with empirical observations. Our findings reveal that these artifacts originate from the pre-trained network itself, specifically stemming from the leading left singular vector of the network's weights. Furthermore, to mitigate these defects, we propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset, thereby avoiding the need for complete re-training. We validate our method on various downstream tasks, including unsupervised segmentation, classification, supervised segmentation, and depth estimation, demonstrating its effectiveness in improving model performance. Codes and checkpoints are available at https://github.com/haoqiwang/sinder.

Type

conference paper

DOI

10.1007/978-3-031-72667-5_2

Web of Science ID

WOS:001346380800002

Author(s)

Wang, Haoqi

École Polytechnique Fédérale de Lausanne

Zhang, Tong

École Polytechnique Fédérale de Lausanne

Salzmann, Mathieu

École Polytechnique Fédérale de Lausanne

Editors

Leonardis, A

•

Ricci, E

•

Roth, S

•

Russakovsky, O

•

Sattler, T

•

Varol, G

Date Issued

2025-01-01

Publisher

Springer Nature

Publisher place

CHAM

Published in

Computer Vision-Eccv 2024, Pt Vii

ISBN of the book

978-3-031-72666-8

978-3-031-72667-5

Series title/Series vol.

Lecture Notes in Computer Science; 15065

ISSN (of the series)

0302-9743

1611-3349

Start page

20

End page

35

Subjects

DINOv2

•

Singular Defect

•

Unsupervised Segmentation

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

CVLAB

SDSC-GE

Event name	Event acronym	Event place	Event date
18th European Conference on Computer Vision		Milan, Italy	2024-09-29 - 2024-10-04

Funder	Funding(s)	Grant Number	Grant URL
Swiss National Science Foundation (SNSF)		CRSII5-180359

Available on Infoscience

January 31, 2025

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/246140