Efficient Depth-based Deep Learning Methods for Multi-Party Pose Estimation
Human detection and pose estimation are essential components for any artificial system responsive to the presence of humans and that react according to human-centered tasks. Robotic systems are typical examples, for which the body pose represents fine grained information useful to understand the behavior and activities of people, and interact with them. However, it is a challenging research topic with increasing difficulty given the unknown number of people in a usual scenario and factors like occlusions and sensing conditions. Current state-of-the-art methods have largely used deep Convolutional Neural Networks (CNN) to address the task. Traditionally, the selected CNNs are very deep and overparameterized, hence requiring large amounts of data to achieve good generalization and prevent overfitting. As a consequence, they are not straightforward to deploy in the low budget hardware typically available in practical applications such as HRI.
This thesis studies methods for efficient and reliable 2D and 3D human pose estimation using deep learning approaches. It investigates novel lightweight convolutional network architectures that achieve real-time performance in multi-person scenarios and explores knowledge distillation methods to boost the performance of these models while keeping their efficiency. Moreover, this thesis addresses the high cost of data collection with annotations that arises with our deep learning-based approaches by relying on a large scale dataset of synthetic images with high variability. Domain adaptation methods and data augmentation strategies are proposed to exploit the synthetic corpus in order to achieve good generalization in sensor data. Additionally, this dissertation studies human 3D motion prediction framed as a sequence-to-sequence problem. Non-autoregressive transformer neural networks are proposed to predict elements in parallel to avoid error propagation from predicted elements, observed in autoregressive methods, while at the same time being efficient.
Overall this thesis proposes different efficient and accurate deep learning solutions to design components of a human behaviour understanding system exploited in Human-Robot-Interaction (HRI) scenarios.
EPFL_TH8429.pdf
n/a
openaccess
Copyright
37.03 MB
Adobe PDF
7354a668c08b9e69fa08deb9d0d84728