Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Towards directly modeling raw speech signal for speaker verification using CNNs
 
conference paper

Towards directly modeling raw speech signal for speaker verification using CNNs

Muckenhirn, Hannah
•
Magimai.-Doss, Mathew
•
Marcel, Sébastien
2018
2018 Ieee International Conference On Acoustics, Speech And Signal Processing (Icassp)
IEEE International Conference on Acoustics, Speech and Signal Processing

Speaker verification systems traditionally extract and model cepstral features or filter bank energies from the speech signal. In this paper, inspired by the success of neural network-based approaches to model directly raw speech signal for applications such as speech recognition, emotion recognition and anti-spoofing, we propose a speaker verification approach where speaker discriminative information is directly learned from the speech signal by: (a) first training a CNN-based speaker identification system that takes as input raw speech signal and learns to classify on speakers (unknown to the speaker verification system); and then (b) building a speaker detector for each speaker in the speaker verification system by replacing the output layer of the speaker identification system by two outputs (genuine, impostor), and adapting the system in a discriminative manner with enrollment speech of the speaker and impostor speech data. Our investigations on the Voxforge database shows that this approach can yield systems competitive to state-of-the-art systems. An analysis of the filters in the first convolution layer shows that the filters give emphasis to information in low frequency regions (below 1000 Hz) and implicitly learn to model fundamental frequency information in the speech signal for speaker discrimination.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICASSP.2018.8462165
Web of Science ID

WOS:000446384605011

Author(s)
Muckenhirn, Hannah
Magimai.-Doss, Mathew
Marcel, Sébastien
Date Issued

2018

Publisher

IEEE

Publisher place

New York

Published in
2018 Ieee International Conference On Acoustics, Speech And Signal Processing (Icassp)
ISBN of the book

978-1-5386-4658-8

Start page

4884

End page

4888

Subjects

speaker verification

•

convolutional neural network

•

end-to-end learning

•

fundamental frequency

•

recognition

URL

Related documents

http://publications.idiap.ch/downloads/papers/2018/Muckenhirn_ICASSP_2018.pdf

Related documents

http://publications.idiap.ch/index.php/publications/showcite/Muckenhirn_Idiap-RR-30-2017
Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event name
IEEE International Conference on Acoustics, Speech and Signal Processing
Available on Infoscience
July 26, 2018
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/147546
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés