Remote sensing visual question answering with a self-attention multi-modal encoder

Martins, Bruno

doi:10.1145/3557918.3565874

conference paper

Remote sensing visual question answering with a self-attention multi-modal encoder

Silva, João Daniel

•

Magalhães, João

•

Tuia, Devis

November 14, 2022

GeoAI '22: Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

Visual Question Answering (VQA) on remote sensing imagery can help non-expert users in extracting information from Earth observation data. Current approaches follow a neural encoder-decoder design, combining convolutional and recurrent encoders together with cross-modal fusion components. However, in other VQA application domains, the current state-of-the-art methods rely on self-attention, employing multi-modal encoders based on the Transformer architecture. In this work, we assess the degree to which a model based on self-attention can bring improvements over previous methods for remote sensing VQA. We specifically present results with an extended version of a previous model named MM-BERT, originally proposed for medical VQA and which does not require the extraction of region features from the images, or model pre-training with extensive amounts of data. Experiments show that the proposed method can improve results over previous approaches. Even without in-domain pre-training or specific adaptations to the remote sensing domain, and using as input low-resolution versions of the images, we can achieve a high accuracy over three different datasets extensively used in previous studies.

Type

conference paper

DOI

10.1145/3557918.3565874

Author(s)

Silva, João Daniel

Magalhães, João

Tuia, Devis

Martins, Bruno

Date Issued

2022-11-14

Published in

GeoAI '22: Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

ISBN of the book

978-1-450395-32-8

Start page

40

End page

49

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

ECEO

Event name	Event place	Event date
5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery	Seattle, Washington, USA	November 1, 2022

Available on Infoscience

February 9, 2023

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/194693