Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Interact with Earth Observation Images using AI: Transparent Methods and Evaluation for Visual Question Answering
 
doctoral thesis

Interact with Earth Observation Images using AI: Transparent Methods and Evaluation for Visual Question Answering

Tartini-Chappuis, Christel  
2025

Now is an exciting time for the domain of Earth observation (EO), with a multitude of diverse sensors looking at the planet from satellites, airplanes or drones. The volume of imagery acquired is massive, and hold great potential for a variety of applications. However, the ability to extract useful insights from the imagery and thus realize the full potential of EO is limited by a technical barrier: the skills necessary to retrieve specific information from this unique resource. While a large proportion of the population has become familiar with optical, very high resolution images, the use of data-driven pipelines to efficiently retrieve the content of interest from data of various spatial and spectral resolutions is technical and task-specific. These limitations create a gap between available EO data and potential, non-specialist end-users. To tackle this challenge, the task of remote sensing visual question answering (RSVQA) proposes to use natural language to enable the interactions, through questions and answers, between EO data and end-users.

The goals of this thesis are to improve the understanding of RSVQA systems by investigating its different parts, as well as to propose transparent and innovative methodologies and evaluation strategies. The emphasis is on transparency to, on one hand, design architectures that enhance the interpretability of the answer predictions by providing supporting insights, and on the other hand, formulate evaluation metrics that better capture the performances and robustness of the systems.

The first part of this thesis is dedicated to analytical studies. Different strategies to combine representations of the images and the questions are compared in terms of performances but also efficiency. Next, the language model encoder used to produce the questions representation is considered, contrasting the previously-standard recurrent neural network with the modern attention-based transformer. The interest to fine-tune the pre-trained encoders is also examined. Since fine-tuning can have consequences on the robustness of a RSVQA model as it learns the language biases present in the dataset, the pitfall of language biases in RSVQA is thoroughly studied to propose evaluation metrics for both datasets and models.

In the second part, the orientation is on methodological development. The prompt-RSVQA architecture describes the image in text that is then provided as context along with the question to a language model. The availability of additional semantic information allows to separately evaluate both modalities. Building on, the multi-task prompt-RSVQA model focuses on explicitly detecting objects in the visual inputs to improve the predictions of numerical questions and directly visualize their answers in the image. In PAN-RSVQA, the variety of perspectives to describe images is further enhanced and the semantic bottleneck imposed in the previous propositions is enriched by using detailed, vector representations of the visual predictions instead of their distinct labels. Across these three propositions, the motivation is to develop methodologies that go beyond an opaque system and toward more transparency and interpretability, with the end goal of facilitating trustworthy interactions between EO data and diverse end-users' applications.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH11143.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

N/A

Size

16.32 MB

Format

Adobe PDF

Checksum (MD5)

f063979482a29101afbce7f33391a6a8

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés