Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Pan-Rsvqa: Vision Foundation Models as Pseudo-Annotators for Remote Sensing Visual Question Answering
 
conference paper

Pan-Rsvqa: Vision Foundation Models as Pseudo-Annotators for Remote Sensing Visual Question Answering

Chappuis, Christel  
•
Sumbul, Gencer  
•
Montariol, Syrielle  
Show more
2025
2025 IEEE/CFV Computer Society Conference on Computer Vision and Pattern Recognition Workshops. CVPRW 2025
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

While the quantity of Earth observation (EO) images is constantly increasing, the benefits that can be derived from these images are still limited by the required technical expertise to run information extraction pipelines. Using natural language to break this barrier, Remote Sensing Visual Question Answering (RSVQA) aims to make EO images usable by a wider, general public. Traditional RSVQA methods utilize a visual encoder to extract generic features from images, which are then fused with the features of the questions entered by users. Given their multi-task nature, Vision foundation models (VFMs) allow to go beyond such generic visual features, and can be seen as pseudo-annotators extracting diverse sets of features from a collection of inter-related tasks (objects detected, segmentation maps, scene descriptions etc.). In this work, we propose PAN-RSVQA, a new method combining a VFM and its pseudo-annotations with RSVQA by leveraging a transformer-based multi-modal encoder. These pseudoannotations bring diverse, naturally interpretable visual cues, as they are aligned with how humans reason about images: therefore, PAN-RSVQA not only exploits largescale training of VFMs but also enables accurate and interpretable RSVQA. Experiments on two datasets show results on par with the state-of-the-art while enabling enhanced interpretation of the model predictions, which we analyze via sample visual perturbations and ablations of the role of each pseudo-annotator. In addition, PAN-RSVQA is modular and easily extendable to new pseudo-annotators from other VFMs.

  • Details
  • Metrics
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés