Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval
 
research article

Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval

Qi, Mengshi  
•
Qin, Jie
•
Yang, Yi
Show more
January 1, 2021
IEEE Transactions On Image Processing

With the current exponential growth of video-based social networks, video retrieval using natural language is receiving ever-increasing attention. Most existing approaches tackle this task by extracting individual frame-level spatial features to represent the whole video, while ignoring visual pattern consistencies and intrinsic temporal relationships across different frames. Furthermore, the semantic correspondence between natural language queries and person-centric actions in videos has not been fully explored. To address these problems, we propose a novel binary representation learning framework, named Semanticsaware Spatial-temporal Binaries (S(2)Bin), which simultaneously considers spatial-temporal context and semantic relationships for cross-modal video retrieval. By exploiting the semantic relationships between two modalities, S(2)Bin can efficiently and effectively generate binary codes for both videos and texts. In addition, we adopt an iterative optimization scheme to learn deep encoding functions with attribute-guided stochastic training. We evaluate our model on three video datasets and the experimental results demonstrate that S(2)Bin outperforms the state-of-the-art methods in terms of various cross-modal video retrieval tasks.

  • Details
  • Metrics
Type
research article
DOI
10.1109/TIP.2020.3048680
Web of Science ID

WOS:000621399700001

Author(s)
Qi, Mengshi  
Qin, Jie
Yang, Yi
Wang, Yunhong
Luo, Jiebo
Date Issued

2021-01-01

Published in
IEEE Transactions On Image Processing
Volume

30

Start page

2989

End page

3004

Subjects

Computer Science, Artificial Intelligence

•

Engineering, Electrical & Electronic

•

Computer Science

•

Engineering

•

semantics

•

binary codes

•

feature extraction

•

visualization

•

task analysis

•

natural languages

•

stochastic processes

•

cross-modal hashing

•

video retrieval

•

binary representation

•

spatial-temporal features

•

natural language

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
CVLAB  
Available on Infoscience
March 26, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/176673
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés