Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Remote sensing visual question answering with a self-attention multi-modal encoder
 
conference paper

Remote sensing visual question answering with a self-attention multi-modal encoder

Silva, João Daniel
•
Magalhães, João
•
Tuia, Devis  
Show more
November 14, 2022
GeoAI '22: Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

Visual Question Answering (VQA) on remote sensing imagery can help non-expert users in extracting information from Earth observation data. Current approaches follow a neural encoder-decoder design, combining convolutional and recurrent encoders together with cross-modal fusion components. However, in other VQA application domains, the current state-of-the-art methods rely on self-attention, employing multi-modal encoders based on the Transformer architecture. In this work, we assess the degree to which a model based on self-attention can bring improvements over previous methods for remote sensing VQA. We specifically present results with an extended version of a previous model named MM-BERT, originally proposed for medical VQA and which does not require the extraction of region features from the images, or model pre-training with extensive amounts of data. Experiments show that the proposed method can improve results over previous approaches. Even without in-domain pre-training or specific adaptations to the remote sensing domain, and using as input low-resolution versions of the images, we can achieve a high accuracy over three different datasets extensively used in previous studies.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/3557918.3565874
Author(s)
Silva, João Daniel
Magalhães, João
Tuia, Devis  
Martins, Bruno
Date Issued

2022-11-14

Published in
GeoAI '22: Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
ISBN of the book

978-1-450395-32-8

Start page

40

End page

49

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ECEO  
Event nameEvent placeEvent date
5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

Seattle, Washington, USA

November 1, 2022

Available on Infoscience
February 9, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/194693
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés