Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. A Closer Look at Benchmarking Self-supervised Pre-training with Image Classification
 
research article

A Closer Look at Benchmarking Self-supervised Pre-training with Image Classification

Marks, Markus
•
Knott, Manuel
•
Kondapaneni, Neehar
Show more
April 27, 2025
International Journal Of Computer Vision

Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. The model is forced to learn about the data's inherent structure or context by solving a pretext task. With SSL, models can learn from abundant and cheap unlabeled data, significantly reducing the cost of training models where labels are expensive or inaccessible. In Computer Vision, SSL is widely used as pre-training followed by a downstream task, such as supervised transfer, few-shot learning on smaller labeled data sets, and/or unsupervised clustering. Unfortunately, it is infeasible to evaluate SSL methods on all possible downstream tasks and objectively measure the quality of the learned representation. Instead, SSL methods are evaluated using in-domain evaluation protocols, such as fine-tuning, linear probing, and k-nearest neighbors (kNN). However, it is not well understood how well these evaluation protocols estimate the representation quality of a pre-trained model for different downstream tasks under different conditions, such as dataset, metric, and model architecture. In this work, we study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types. Our study includes eleven common image datasets and 26 models that were pre-trained with different SSL methods or have different model backbones. We find that in-domain linear/kNN probing protocols are, on average, the best general predictors for out-of-domain performance. We further investigate the importance of batch normalization for the various protocols and evaluate how robust correlations are for different kinds of dataset domain shifts. In addition, we challenge assumptions about the relationship between discriminative and generative self-supervised methods, finding that most of their performance differences can be explained by changes to model backbones.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1007/s11263-025-02402-w
Web of Science ID

WOS:001476502600001

Author(s)
Marks, Markus

California Institute of Technology

Knott, Manuel

École Polytechnique Fédérale de Lausanne

Kondapaneni, Neehar

California Institute of Technology

Cole, Elijah

Altos Labs

Defraeye, Thijs

Swiss Federal Institutes of Technology Domain

Perez-Cruz, Fernando  

École Polytechnique Fédérale de Lausanne

Perona, Pietro

California Institute of Technology

Date Issued

2025-04-27

Publisher

SPRINGER

Published in
International Journal Of Computer Vision
Subjects

Computer vision

•

Self-supervised learning

•

Benchmarking

•

Image classification

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
EPFL  
FunderFunding(s)Grant NumberGrant URL

Simons Foundation

ETH Zurich Doc.Mobility Fellowship

NIH R01 MH123612A

United States Department of Health & Human Services

Show more
Available on Infoscience
May 6, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/249844
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés