Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models
 
conference paper

Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Kim, Yeongbin
•
Singh, Gautam
•
Park, Junyeong
Show more
November 15, 2023
37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks
37th Annual Conference on Neural Information Processing Systems

Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infancy. We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on. SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics. The framework provides benefits such as the possibility to jointly optimize for systematic perception and imagination, a range of difficulty levels, and the ability to control the fraction of possible factor combinations used during training. We provide a comprehensive evaluation of various baseline models on SVIB, offering insight into the current state-of-the-art in systematic visual imagination. We hope that this benchmark will help advance visual systematic compositionality.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

2311.09064.pdf

Type

Postprint

Version

Accepted version

Access type

openaccess

License Condition

copyright

Size

6.21 MB

Format

Adobe PDF

Checksum (MD5)

714937268b91d6f0dfa28f35ce734dd5

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés