Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
 
conference paper

MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning

Abdelfattah, Mohamed Ossama Ahmed  
•
Hassan, Mariam  
•
Alahi, Alexandre  
June 17, 2024
Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2024
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Current transformer-based skeletal action recognition models tend to focus on a limited set of joints and low-level motion patterns to predict action classes. This results in significant performance degradation under small skeleton perturbations or changing the pose estimator between training and testing. In this work, we introduce MaskCLR, a new Masked Contrastive Learning approach for Robust skeletal action recognition. We propose an Attention-Guided Probabilistic Masking strategy to occlude the most important joints and encourage the model to explore a larger set of discriminative joints. Furthermore, we propose a Multi-Level Contrastive Learning paradigm to enforce the representations of standard and occluded skeletons to be class-discriminative, i.e., more compact within each class and more dispersed across different classes. Our approach helps the model capture the high-level action semantics instead of low-level joint variations, and can be conveniently incorporated into transformer based models. Without loss of generality, we combine MaskCLR with three transformer backbones: the vanilla transformer, DSTFormer, and STTFormer. Extensive experiments on NTU60, NTU120, and Kinetics400 show that MaskCLR consistently outperforms previous state-of-the-art methods on standard and perturbed skeletons from different pose estimators, showing improved accuracy, generalization, and robustness.

  • Details
  • Metrics
Type
conference paper
Author(s)
Abdelfattah, Mohamed Ossama Ahmed  
Hassan, Mariam  
Alahi, Alexandre  
Date Issued

2024-06-17

Published in
Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2024
Total of pages

8

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
VITA  
VITA  
Event nameEvent placeEvent date
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Seattle, Washington, USA

June 17-21, 2024

Available on Infoscience
April 12, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/207056
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés