Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. 360Spred: Saliency Prediction for 360-Degree Videos Based on 3D Separable Graph Convolutional Networks
 
research article

360Spred: Saliency Prediction for 360-Degree Videos Based on 3D Separable Graph Convolutional Networks

Yang, Qin
•
Gao, Wenxuan
•
Li, Chenglin
Show more
October 10, 2024
IEEE Transactions on Circuits and Systems for Video Technology

Predicting the saliency map of a 360-degree video is the key for various downstream tasks, such as saliency-based compression and tile-based adaptive streaming. Besides static salient objects, the moving target will also contribute to the saliency map. Therefore, the joint exploitation of spherical spatio-temporal information is necessary for an accurate saliency prediction. The spherical spatial feature extraction, however, is hindered by the non-Euclidean geometric nature of spherical data, which imposes difficulty on direct extraction of the spatial features with traditional convolutional neural networks (CNNs). While the efficient exploitation of temporal correlation between these spherical spatial features remains another challenge, which requires the extraction of spherical optical flows for explicit motion information. To address these, in this paper, we first propose a spherical graph-based Farneback algorithm to extract the spherical optical flows directly in the sphere domain, by leveraging the GICOPix uniform sampling scheme. We then design a 3D separable graph convolutional network-based saliency prediction framework, named 360Spred, by taking both the spherical frames and spherical optical flows as input. The proposed 360Spred framework is based on the U-Net structure, with a 3D separable graph convolution (3DSGC) operator that directly extracts the visual and motion features in the sphere domain and exploits temporal correlation of both the high-level and low-level spatial features. Experimental results on two public datasets show that 360Spred can achieve a better performance than other baseline models in terms of the saliency prediction accuracy for 360-degree videos.

  • Details
  • Metrics
Type
research article
DOI
10.1109/TCSVT.2024.3407685
Scopus ID

2-s2.0-85194848484

Author(s)
Yang, Qin
Gao, Wenxuan
Li, Chenglin
Wang, Hao
Dai, Wenrui
Zou, Junni
Xiong, Hongkai
Frossard, Pascal  

EPFL

Date Issued

2024-10-10

Published in
IEEE Transactions on Circuits and Systems for Video Technology
Volume

34

Issue

10

Start page

9979

End page

9996

Subjects

360-degree videos

•

3D convolution

•

Graph convolution

•

optical flow

•

saliency prediction

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LTS4  
FunderFunding(s)Grant NumberGrant URL

National Natural Science Foundation of China

61931023,61932022,62120106007,62125109,62250055,62301299,62320106003,62371288,T2122024

Available on Infoscience
January 21, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/243057
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés