Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Beyond Essentials: Nuanced and Diverse Text-to-video Retrieval
 
conference paper

Beyond Essentials: Nuanced and Diverse Text-to-video Retrieval

Yang, Yuchen  
Ding, Wei
•
Lu, Chang-Tien
Show more
2024
Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024
IEEE International Conference on Big Data

The field of text-to-video retrieval has advanced significantly with the evolution of language models and large-scale pre-training on generated caption-video pairs. Current methods predominantly focus on visual and event-based details, making retrieval largely reliant on tangible aspects. However, videos encompass more than just "seen"or "heard"elements, containing diverse, nuanced layers that are often overlooked.This work addresses this gap by introducing a method that incorporates audio, style, and emotion considerations into text-to-video retrieval through three key components. First, an augmentation block is implemented to generate additional textual information on a video's audio, style, and emotional aspects, supplementing the original caption. Second, a cross-modal audiovisual attention block fuses visual and audio data within the video, aligning it with this enriched textual information. Third, hybrid space learning is applied, using multiple latent spaces to align textual and video data, which minimizes potential conflicts between various information sources.In standard evaluations, models are often tested on benchmark datasets that emphasize simple, short, visual and event-based queries. To more accurately assess model performance under diverse query conditions that capture the nuanced dimensions of video content, we developed a new evaluation dataset. Our results demonstrate that, while our method performs comparably with state-of-the-art models on conventional test sets, it surpasses non-pre-trained models when addressing more complex queries, as evidenced by this novel test dataset.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/BigData62323.2024.10825297
Scopus ID

2-s2.0-85218019153

Author(s)
Yang, Yuchen  

EPFL

Editors
Ding, Wei
•
Lu, Chang-Tien
•
Wang, Fusheng
•
Di, Liping
•
Wu, Kesheng
•
Huan, Jun
•
Nambiar, Raghu
•
Li, Jundong
•
Ilievski, Filip
•
Baeza-Yates, Ricardo
Show more
Date Issued

2024

Publisher

Institute of Electrical and Electronics Engineers Inc.

Published in
Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024
DOI of the book
10.1109/BigData62323.2024
ISBN of the book

9798350362480

Start page

2549

End page

2557

Subjects

Audiovisual Archive

•

Computational archival science

•

Machine learning

•

Video Retrieval

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
EMPLUS  
Event nameEvent acronymEvent placeEvent date
IEEE International Conference on Big Data

Washington, United States

2024-12-15 - 2024-12-18

Available on Infoscience
February 26, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/247246
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés