Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models
 
Loading...
Thumbnail Image
research article

WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models

Gabeff, Valentin Alexandre Guy  
•
Russwurm, Marc  
•
Tuia, Devis  
Show more
April 24, 2024
International Journal Of Computer Vision

Wildlife observation with camera traps has great potential for ethology and ecology, as it gathers data non-invasively in an automated way. However, camera traps produce large amounts of uncurated data, which is time-consuming to annotate. Existing methods to label these data automatically commonly use a fixed pre-defined set of distinctive classes and require many labeled examples per class to be trained. Moreover, the attributes of interest are sometimes rare and difficult to find in large data collections. Large pretrained vision-language models, such as contrastive language image pretraining (CLIP), offer great promises to facilitate the annotation process of camera-trap data. Images can be described with greater detail, the set of classes is not fixed and can be extensible on demand and pretrained models can help to retrieve rare samples. In this work, we explore the potential of CLIP to retrieve images according to environmental and ecological attributes. We create WildCLIP by fine-tuning CLIP on wildlife camera-trap images and to further increase its flexibility, we add an adapter module to better expand to novel attributes in a few-shot manner. We quantify WildCLIP's performance and show that it can retrieve novel attributes in the Snapshot Serengeti dataset. Our findings outline new opportunities to facilitate annotation processes with complex and multi-attribute captions. The code is available at https://github.com/amathislab/wildclip.

  • Details
  • Metrics
Type
research article
DOI
10.1007/s11263-024-02026-6
Web of Science ID

WOS:001207726100007

Author(s)
Gabeff, Valentin Alexandre Guy  
•
Russwurm, Marc  
•
Tuia, Devis  
•
Mathis, Alexander  
Date Issued

2024-04-24

Published in
International Journal Of Computer Vision
Subjects

Technology

•

Vision-Language Models

•

Clip

•

Wildlife

•

Camera Traps

•

Few-Shot Learning

•

Vocabulary Replay

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ECEO  
UPAMATHIS  
FunderGrant Number

EPFL Lausanne

EPFL's SV-ENAC I-PhD program

RelationURL/DOI

IsSupplementedBy

https://infoscience.epfl.ch/record/311775
Available on Infoscience
May 1, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/207764
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés