360Spred: Saliency Prediction for 360-Degree Videos Based on 3D Separable Graph Convolutional Networks

Yang, Qin; Gao, Wenxuan; Li, Chenglin; Wang, Hao; Dai, Wenrui; Zou, Junni; Xiong, Hongkai; Frossard, Pascal

doi:10.1109/TCSVT.2024.3407685

research article

360Spred: Saliency Prediction for 360-Degree Videos Based on 3D Separable Graph Convolutional Networks

Yang, Qin

•

Gao, Wenxuan

•

Li, Chenglin

October 10, 2024

IEEE Transactions on Circuits and Systems for Video Technology

Predicting the saliency map of a 360-degree video is the key for various downstream tasks, such as saliency-based compression and tile-based adaptive streaming. Besides static salient objects, the moving target will also contribute to the saliency map. Therefore, the joint exploitation of spherical spatio-temporal information is necessary for an accurate saliency prediction. The spherical spatial feature extraction, however, is hindered by the non-Euclidean geometric nature of spherical data, which imposes difficulty on direct extraction of the spatial features with traditional convolutional neural networks (CNNs). While the efficient exploitation of temporal correlation between these spherical spatial features remains another challenge, which requires the extraction of spherical optical flows for explicit motion information. To address these, in this paper, we first propose a spherical graph-based Farneback algorithm to extract the spherical optical flows directly in the sphere domain, by leveraging the GICOPix uniform sampling scheme. We then design a 3D separable graph convolutional network-based saliency prediction framework, named 360Spred, by taking both the spherical frames and spherical optical flows as input. The proposed 360Spred framework is based on the U-Net structure, with a 3D separable graph convolution (3DSGC) operator that directly extracts the visual and motion features in the sphere domain and exploits temporal correlation of both the high-level and low-level spatial features. Experimental results on two public datasets show that 360Spred can achieve a better performance than other baseline models in terms of the saliency prediction accuracy for 360-degree videos.

Type

research article

DOI

10.1109/TCSVT.2024.3407685

Scopus ID

2-s2.0-85194848484

Author(s)

Yang, Qin

Gao, Wenxuan

Li, Chenglin

Wang, Hao

Dai, Wenrui

Zou, Junni

Xiong, Hongkai

Frossard, Pascal

EPFL

Date Issued

2024-10-10

Published in

IEEE Transactions on Circuits and Systems for Video Technology

Volume

34

Issue

10

Start page

9979

End page

9996

Subjects

360-degree videos

•

3D convolution

•

Graph convolution

•

optical flow

•

saliency prediction

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LTS4

Funder	Funding(s)	Grant Number	Grant URL
National Natural Science Foundation of China		61931023,61932022,62120106007,62125109,62250055,62301299,62320106003,62371288,T2122024

Available on Infoscience

January 21, 2025

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/243057