Visual question answering from another perspective: CLEVR mental rotation tests *

Beckham, Christopher; Weiss, Martin; Golemo, Florian; Honari, Sina; Nowrouzezahrai, Derek; Pal, Christopher

doi:10.1016/j.patcog.2022.109209

research article

Visual question answering from another perspective: CLEVR mental rotation tests *

Beckham, Christopher

•

Weiss, Martin

•

Golemo, Florian

more

April 1, 2023

Pattern Recognition

Different types of mental rotation tests have been used extensively in psychology to understand human visual reasoning and perception. Understanding what an object or visual scene would look like from another viewpoint is a challenging problem that is made even harder if it must be performed from a single image. We explore a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint. To do this we have created a new version of the CLEVR dataset that we call CLEVR Mental Rotation Tests (CLEVR-MRT). Using CLEVR-MRT we examine standard methods, show how they fall short, then explore novel neural architectures that involve inferring volumetric representations of a scene. These volumes can be manipulated via camera-conditioned transformations to answer the question. We examine the efficacy of different model variants through rigorous ablations and demonstrate the efficacy of volumetric representations.(c) 2022 Elsevier Ltd. All rights reserved.

Type

research article

DOI

10.1016/j.patcog.2022.109209

Web of Science ID

WOS:000900874600004

Authors

Beckham, Christopher

•

Weiss, Martin

•

Golemo, Florian

•

Honari, Sina

•

Nowrouzezahrai, Derek

•

Pal, Christopher

Publication date

2023-04-01

Publisher

ELSEVIER SCI LTD

Published in

Pattern Recognition

Volume

136

Article Number

109209

Subjects

Computer Science, Art...

Engineering, Electric...

Computer Science

Engineering

deep learning

computer vision

visual question answe...

contrastive learning

clevr

reconstruction

Peer reviewed

REVIEWED

EPFL units

CVLAB

Available on Infoscience

January 16, 2023

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/193875