Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Efficient Depth-based Deep Learning Methods for Multi-Party Pose Estimation
 
doctoral thesis

Efficient Depth-based Deep Learning Methods for Multi-Party Pose Estimation

Martinez Gonzalez, Angel Noe  
2021

Human detection and pose estimation are essential components for any artificial system responsive to the presence of humans and that react according to human-centered tasks. Robotic systems are typical examples, for which the body pose represents fine grained information useful to understand the behavior and activities of people, and interact with them. However, it is a challenging research topic with increasing difficulty given the unknown number of people in a usual scenario and factors like occlusions and sensing conditions. Current state-of-the-art methods have largely used deep Convolutional Neural Networks (CNN) to address the task. Traditionally, the selected CNNs are very deep and overparameterized, hence requiring large amounts of data to achieve good generalization and prevent overfitting. As a consequence, they are not straightforward to deploy in the low budget hardware typically available in practical applications such as HRI.

This thesis studies methods for efficient and reliable 2D and 3D human pose estimation using deep learning approaches. It investigates novel lightweight convolutional network architectures that achieve real-time performance in multi-person scenarios and explores knowledge distillation methods to boost the performance of these models while keeping their efficiency. Moreover, this thesis addresses the high cost of data collection with annotations that arises with our deep learning-based approaches by relying on a large scale dataset of synthetic images with high variability. Domain adaptation methods and data augmentation strategies are proposed to exploit the synthetic corpus in order to achieve good generalization in sensor data. Additionally, this dissertation studies human 3D motion prediction framed as a sequence-to-sequence problem. Non-autoregressive transformer neural networks are proposed to predict elements in parallel to avoid error propagation from predicted elements, observed in autoregressive methods, while at the same time being efficient.

Overall this thesis proposes different efficient and accurate deep learning solutions to design components of a human behaviour understanding system exploited in Human-Robot-Interaction (HRI) scenarios.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH8429.pdf

Type

N/a

Access type

openaccess

License Condition

Copyright

Size

37.03 MB

Format

Adobe PDF

Checksum (MD5)

7354a668c08b9e69fa08deb9d0d84728

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés