Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Training Visual Language Models with Object Detection: Grounded Change Descriptions in Satellite Images
 
conference paper

Training Visual Language Models with Object Detection: Grounded Change Descriptions in Satellite Images

Prado, João Luis  
•
Montariol, Syrielle  
•
Castillo-Navarro, Javiera  
Show more
2024
International Geoscience and Remote Sensing Symposium (IGARSS)
IEEE International Geoscience and Remote Sensing Symposium

Recently, generalist Vision Language Models (VLMs) have shown exceptional progress in tasks previously dominated by specialized computer vision models. This becomes more prevalent when visual grounding capabilities, such as the ability to reason over input text and image to generate bounding boxes around objects, are required. However, how these capabilities transfer to specialized domains such as remote sensing remains understudied, despite the recent increase in specialized models for Earth observation. In this work, we evaluate how grounding visual entities - by generating bounding-box coordinates - affects VLM performance in satellite imagery. To this end, we create two instruction-following tasks sourced from the xBD dataset, describing changes due to natural disasters observed in satellite images. We fine-tune several instances of MiniGPTv2, an open-source VLM with grounding capabilities, and evaluate their performance under the "grounded"vs. "not grounded"settings. We find that generating bounding boxes to refer to visual entities increases performance in tasks related to objects in the image, but only when the number of entities in the image is limited.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/IGARSS53475.2024.10641080
Scopus ID

2-s2.0-85204919621

Author(s)
Prado, João Luis  

École Polytechnique Fédérale de Lausanne

Montariol, Syrielle  

École Polytechnique Fédérale de Lausanne

Castillo-Navarro, Javiera  

École Polytechnique Fédérale de Lausanne

Tuia, Devis  

École Polytechnique Fédérale de Lausanne

Bosselut, Antoine  

École Polytechnique Fédérale de Lausanne

Date Issued

2024

Publisher

Institute of Electrical and Electronics Engineers Inc.

Published in
International Geoscience and Remote Sensing Symposium (IGARSS)
ISBN of the book

9798350360325

Start page

2745

End page

2749

Subjects

Earth Observation

•

Object Detection

•

Vision-Language Models

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
NLP  
ECEO  
Event nameEvent acronymEvent placeEvent date
IEEE International Geoscience and Remote Sensing Symposium

Athens, Greece

2024-07-07 - 2024-07-12

FunderFunding(s)Grant NumberGrant URL

Sony Group Corporation

EPFL Science Seed Fund

Allen Institute

Show more
Available on Infoscience
January 26, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/244761
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés