Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

ICCV 2017

Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, Pascal Fua

In the document below, we provide additional quantitative analysis and running time of our algorithm. Furthermore, we present supplementary qualitative results on the Human3.6m, HumanEva, KTH Multiview Football II and Leeds Sports Pose datasets.

Our approach is able to disambiguate challenging poses with mirroring and self-occlusion and achieves state-of-the-art performance by fusing 2D and 3D image cues. We provide several example videos on Human3.6m below. The first skeleton depicts our prediction and the second the ground-truth. Best viewed in full-screen mode.

Walking

Sitting

Posing

We further provide predictions from HumanEva-I sequences below.

We also demonstrate the performance of our approach on KTH Multiview Football II below.

The supplementary videos are encoded by FFMPEG with h.264 codec. If you can't play the video, please download the VLC player at: http://www.videolan.org/vlc/index.html