Pan-Rsvqa: Vision Foundation Models as Pseudo-Annotators for Remote Sensing Visual Question Answering
While the quantity of Earth observation (EO) images is constantly increasing, the benefits that can be derived from these images are still limited by the required technical expertise to run information extraction pipelines. Using natural language to break this barrier, Remote Sensing Visual Question Answering (RSVQA) aims to make EO images usable by a wider, general public. Traditional RSVQA methods utilize a visual encoder to extract generic features from images, which are then fused with the features of the questions entered by users. Given their multi-task nature, Vision foundation models (VFMs) allow to go beyond such generic visual features, and can be seen as pseudo-annotators extracting diverse sets of features from a collection of inter-related tasks (objects detected, segmentation maps, scene descriptions etc.). In this work, we propose PAN-RSVQA, a new method combining a VFM and its pseudo-annotations with RSVQA by leveraging a transformer-based multi-modal encoder. These pseudoannotations bring diverse, naturally interpretable visual cues, as they are aligned with how humans reason about images: therefore, PAN-RSVQA not only exploits largescale training of VFMs but also enables accurate and interpretable RSVQA. Experiments on two datasets show results on par with the state-of-the-art while enabling enhanced interpretation of the model predictions, which we analyze via sample visual perturbations and ablations of the role of each pseudo-annotator. In addition, PAN-RSVQA is modular and easily extendable to new pseudo-annotators from other VFMs.
2-s2.0-105017852217
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
Laboratoire d’Informatique Paris Descartes
École Polytechnique Fédérale de Lausanne
2025
9798331599942
2996
3007
REVIEWED
EPFL
| Event name | Event acronym | Event place | Event date |
CVPRW 2025 | Nashville, TN, USA | 2025-06-11 - 2025-06-12 | |