Deep Neural Networks for Multiple Speaker Detection and Localization

He, Weipeng; Motlicek, Petr; Odobez, Jean-Marc

doi:10.1109/ICRA.2018.8461267

conference paper

Deep Neural Networks for Multiple Speaker Detection and Localization

He, Weipeng

•

Motlicek, Petr

•

Odobez, Jean-Marc

2018

2018 IEEE International Conference on Robotics and Automation (ICRA)

We propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. Previous neural network-based methods have been focusing on localizing a single sound source, which do not extend to multiple sources in terms of detection and localization. In this paper, we thus propose a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources. In addition, we investigate the use of sub-band cross-correlation information as features for better localization in sound mixtures, as well as three different network architectures based on different motivations. Experiments on real data recorded from a robot show that our proposed methods significantly outperform the popular spatial spectrum-based approaches.

Type

conference paper

DOI

10.1109/ICRA.2018.8461267

Author(s)

He, Weipeng

Motlicek, Petr

Odobez, Jean-Marc

Date Issued

2018

Published in

2018 IEEE International Conference on Robotics and Automation (ICRA)

ISBN of the book

978-1-5386-3081-5

Start page

74

End page

79

URL

Related documents

http://publications.idiap.ch/index.php/publications/showcite/He_Idiap-RR-02-2018

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LIDIAP

Event name	Event place	Event date
2018 IEEE International Conference on Robotics and Automation (ICRA)	Brisbane, AUSTRALIA	May 21-25, 2018

Available on Infoscience

July 26, 2018

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/147515