Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Deep Learning Approaches for Auditory Perception in Robotics
 
doctoral thesis

Deep Learning Approaches for Auditory Perception in Robotics

He, Weipeng  
2021

Auditory perception is an essential part of a robotic system in Human-Robot Interaction (HRI), and creating an artificial auditory perception system that is on par with human has been a long-standing goal for researchers. In fact, this is a challenging research topic, because in typical HRI scenarios the audio signal is often corrupted by the robot ego noise, other background noise and overlapping voices. The traditional approaches based on signal processing seek analytical solutions according to the physical law of sound propagation as well as assumptions about the signal, noise and environments. However, such approaches either assume over-simplified conditions, or create sophisticated models that do not generalize well in real situations.

This thesis introduces an alternative methodology to auditory perception in robotics by using deep learning techniques. It includes a group of novel deep learning-based approaches addressing sound source localization, speech/non-speech classification, and speaker re-identification. The deep learning-based approaches rely on neural network models that learn directly from the data without making many assumptions. They are shown by experiments with real robots to outperform the traditional methods in complex environments, where there are multiple speakers, interfering noises and no a priori knowledge about the number of sources.

In addition, this thesis addresses the issue of high cost of data collection which arises with learning-based approaches. Domain adaptation and data augmentation methods are proposed to exploit simulated data and weakly-labeled real data, so that the effort for data collection is minimized. Overall, this thesis suggests a practical and robust solution for auditory perception in robotics in the wild.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-8000
Author(s)
He, Weipeng  
Advisors
Odobez, Jean-Marc  
•
Motlicek, Petr  
Jury

Prof. Pascal Frossard (président) ; Dr Jean-Marc Odobez, Dr Petr Motlicek (directeurs) ; Prof. Jean-Philippe Thiran, Prof. Joakim Gustafson, Dr Antoine Deleforge (rapporteurs)

Date Issued

2021

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2021-03-23

Thesis number

8000

Total of pages

143

Subjects

robotic auditory perception

•

deep learning

•

sound source localization

•

DOA estimation

•

domain adaptation

•

multi-task learning

•

human-robot interaction

EPFL units
LIDIAP  
Faculty
STI  
School
IEL  
Doctoral School
EDEE  
Available on Infoscience
March 19, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/176036
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés