Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
 
conference paper

A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

Stegmüller, Thomas  
•
Lebailly, Tim
•
Ðukić, Nikola
Show more
Vorobeychik, Yevgeniy
•
Das, Sanmay
Show more
2025
13th International Conference on Learning Representations, ICLR 2025
The Thirteenth International Conference on Learning Representations

Zero-shot classification capabilities naturally arise in models trained within a vision-language contrastive framework. Despite their classification prowess, these models struggle in dense tasks like zero-shot open-vocabulary segmentation. This deficiency is often attributed to the absence of localization cues in captions and the intertwined nature of the learning process, which encompasses both image/text representation learning and cross-modality alignment. To tackle these issues, we propose SimZSS, a Simple framework for open-vocabulary Zero-Shot Segmentation. The method is founded on two key principles: i) leveraging frozen vision-only models that exhibit spatial awareness while exclusively aligning the text encoder and ii) exploiting the discrete nature of text and linguistic knowledge to pinpoint local concepts within captions. By capitalizing on the quality of the visual representations, our method requires only image-caption pair datasets and adapts to both small curated and large-scale noisy datasets. When trained on COCO Captions across 8 GPUs, SimZSS achieves state-of-the-art results on 7 out of 8 benchmark datasets in less than 15 minutes. Our code and pretrained models are publicly available at https://github.com/tileb1/simzss.

  • Details
  • Metrics
Type
conference paper
Scopus ID

2-s2.0-105010207910

Author(s)
Stegmüller, Thomas  

École Polytechnique Fédérale de Lausanne

Lebailly, Tim

KU Leuven

Ðukić, Nikola

KU Leuven

Bozorgtabar, Behzad  

École Polytechnique Fédérale de Lausanne

Tuytelaars, Tinne

KU Leuven

Thiran, Jean Philippe  

École Polytechnique Fédérale de Lausanne

Editors
Vorobeychik, Yevgeniy
•
Das, Sanmay
•
Nowe, Ann
Date Issued

2025

Publisher

International Conference on Learning Representations, ICLR

Published in
13th International Conference on Learning Representations, ICLR 2025
ISBN of the book

9798331320850

Series title/Series vol.

RILEM Bookseries; 57

ISSN (of the series)

2211-0852

2211-0844

Published in
Transactions on Machine Learning Research
Volume

2025-July

Start page

72821

End page

72842

Subjects

Behavior encapsulation

•

Continual learning

•

Planning

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LTS5  
Event nameEvent acronymEvent placeEvent date
The Thirteenth International Conference on Learning Representations

ICLR 2025

Singapore

2025-04-24-2025-04-28

FunderFunding(s)Grant NumberGrant URL

Flemish Government

Onderzoeksprogramma Artificiele Intelligentie

European Research Council

Show more
Available on Infoscience
July 25, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/252654
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés