Learning to Align Sequential Actions in the Wild

Liu, Weizhe; Tekin, Bugra; Coskun, Huseyn; Vineet, Vibhav; Fua, Pascal

doi:10.1109/CVPR52688.2022.00222

conference paper

Learning to Align Sequential Actions in the Wild

Liu, Weizhe

•

Tekin, Bugra

•

Coskun, Huseyn

2022

2022 Ieee/Cvf Conference On Computer Vision And Pattern Recognition (Cvpr 2022)

CVPR 2022 : IEEE/CVF Conference on Computer Vision and Pattern Recognition

State-of-the-art methods for self-supervised sequential action alignment rely on deep networks that find correspon- dences across videos in time. They either learn frame-to- frame mapping across sequences, which does not leverage temporal information, or assume monotonic alignment be- tween each video pair, which ignores variations in the or- der of actions. As such, these methods are not able to deal with common real-world scenarios that involve background frames or videos that contain non-monotonic sequence of actions. In this paper, we propose an approach to align sequential actions in the wild that involve diverse temporal variations. To this end, we propose an approach to enforce tempo- ral priors on the optimal transport matrix, which leverages temporal consistency, while allowing for variations in the order of actions. Our model accounts for both monotonic and non-monotonic sequences and handles background frames that should not be aligned. We demonstrate that our approach consistently outperforms the state-of-the-art in self-supervised sequential action representation learning on four different benchmark dataset.

Name

Learning_to_Align_Sequential_Actions_in_the_Wild.pdf

Type

Postprint

Version

Accepted version

Access type

openaccess

License Condition

CC BY

Size

7.67 MB

Format

Adobe PDF

Checksum (MD5)

1b877811a9c9e54a522075f866fb7fcd