Deep Non-Rigid Structure-From-Motion: A Sequence-to-Sequence Translation Perspective

Deng, Hui; Zhang, Tong; Dai, Yuchao; Shi, Jiawei; Zhong, Yiran; Li, Hongdong

doi:10.1109/TPAMI.2024.3443922

research article

Deep Non-Rigid Structure-From-Motion: A Sequence-to-Sequence Translation Perspective

Deng, Hui

•

Zhang, Tong

•

Dai, Yuchao

2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the 3D sequence from the input 2D sequence. In this paper, we propose to solve deep sparse NRSfM from a sequence-to-sequence translation perspective, where the input 2D keypoints sequence is taken as a whole to reconstruct the corresponding 3D keypoints sequence in a self-supervised manner. First, we apply a shape-motion predictor on the input sequence to obtain an initial sequence of shapes and corresponding motions. Then, we propose the Context Layer, which enables the deep learning framework to effectively impose overall constraints on sequences based on the structural characteristics of non-rigid sequences. The Context Layer constructs modules for imposing the self-expressiveness regularity on non-rigid sequences with multi-head attention (MHA) as the core, together with the use of temporal encoding, both of which act simultaneously to constitute constraints on non-rigid sequences in the deep framework. Experimental results across different datasets such as Human3.6M, CMU Mocap, and InterHand prove the superiority of our framework. The code will be made publicly available.

Type

research article

DOI

10.1109/TPAMI.2024.3443922

Scopus ID

2-s2.0-85201443380

PubMed ID

39150802

Author(s)

Deng, Hui

Northwestern Polytechnical University

Zhang, Tong

École Polytechnique Fédérale de Lausanne

Dai, Yuchao

Northwestern Polytechnical University

Shi, Jiawei

Northwestern Polytechnical University

Zhong, Yiran

Shanghai Artificial Intelligence Laboratory

Li, Hongdong

The Australian National University

Date Issued

2024

Published in

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volume

46

Issue

12

Start page

10814

End page

10828

Subjects

Non-rigid structure-from-motion (NRSfM)

•

self- expressiveness

•

self-attention

•

sequence-to-sequence

•

temporal encoding

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

IVRL

Funder	Funding(s)	Grant Number	Grant URL
Fundamental Research Funds for the Central Universities
National Natural Science Foundation of China		61871325,62271410
Swiss National Science Foundation		CRSII5-180359

Available on Infoscience

January 24, 2025

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/243535