Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Visual question answering from another perspective: CLEVR mental rotation tests *
 
research article

Visual question answering from another perspective: CLEVR mental rotation tests *

Beckham, Christopher
•
Weiss, Martin
•
Golemo, Florian
Show more
April 1, 2023
Pattern Recognition

Different types of mental rotation tests have been used extensively in psychology to understand human visual reasoning and perception. Understanding what an object or visual scene would look like from another viewpoint is a challenging problem that is made even harder if it must be performed from a single image. We explore a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint. To do this we have created a new version of the CLEVR dataset that we call CLEVR Mental Rotation Tests (CLEVR-MRT). Using CLEVR-MRT we examine standard methods, show how they fall short, then explore novel neural architectures that involve inferring volumetric representations of a scene. These volumes can be manipulated via camera-conditioned transformations to answer the question. We examine the efficacy of different model variants through rigorous ablations and demonstrate the efficacy of volumetric representations.(c) 2022 Elsevier Ltd. All rights reserved.

  • Details
  • Metrics
Type
research article
DOI
10.1016/j.patcog.2022.109209
Web of Science ID

WOS:000900874600004

Author(s)
Beckham, Christopher
Weiss, Martin
Golemo, Florian
Honari, Sina  
Nowrouzezahrai, Derek
Pal, Christopher
Date Issued

2023-04-01

Publisher

ELSEVIER SCI LTD

Published in
Pattern Recognition
Volume

136

Article Number

109209

Subjects

Computer Science, Artificial Intelligence

•

Engineering, Electrical & Electronic

•

Computer Science

•

Engineering

•

deep learning

•

computer vision

•

visual question answering

•

contrastive learning

•

clevr

•

reconstruction

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
CVLAB  
Available on Infoscience
January 16, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/193875
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés