Katiricioglu, Isinsu*Tekin, Bugra*Salzmann, MathieuLepetit, VincentFua, Pascal2018-01-312018-01-312018-01-312018-01-3110.1007/s11263-018-1066-6https://infoscience.epfl.ch/handle/20.500.14299/144564Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from an image to a 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images or 2D joint location heatmaps that relies on an overcomplete autoencoder to learn a high-dimensional latent pose representation and accounts for joint dependencies. We further propose an efficient Long Short-Term Memory network to enforce temporal consistency on 3D pose predictions. We demonstrate that our approach achieves state-of-the-art performance both in terms of structure preservation and prediction accuracy on standard 3D human pose estimation benchmarks.3D human pose estimationStructured predictionDeep learningLearning Latent Representations of 3D Human Pose with Deep Neural Networkstext::journal::journal article::research article