Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation
 
research article

Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation

Honari, Sina  
•
Constantin, Victor  
•
Rhodin, Helge  
Show more
2022
IEEE Transactions on Pattern Analysis and Machine Intelligence

In this paper we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1109/TPAMI.2022.3215307
Author(s)
Honari, Sina  
Constantin, Victor  
Rhodin, Helge  
Salzmann, Mathieu  
Fua, Pascal  
Date Issued

2022

Published in
IEEE Transactions on Pattern Analysis and Machine Intelligence
Subjects

Temporal Feature Extraction

•

Unsupervised Representation Learning

•

Contrastive Learning

•

3D Human Pose

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
CVLAB  
Available on Infoscience
October 19, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/191481
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés