Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs
 
conference paper

On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs

Muckenhirn, Hannah
•
Magimai.-Doss, Mathew
•
Marcel, Sébastien
2018
19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6
Proceedings of Interspeech

In a recent work, we have shown that speaker verification systems can be built where both features and classifiers are directly learned from the raw speech signal with convolutional neural networks (CNNs). In this framework, the training phase also decides the block processing through cross validation. It was found that the first convolution layer, which processes about 20 ms speech, learns to model fundamental frequency information. In the present paper, inspired from speech recognition studies, we build further on that framework to design a CNN-based system, which models sub-segmental speech (about 2ms speech) in the first convolution layer, with an hypothesis that such a system should learn vocal tract system related speaker discriminative information. Through experimental studies on Voxforge corpus and analysis on American vowel dataset, we show that the proposed system (a) indeed focuses on formant regions, (b) yields competitive speaker verification system and (c) is complementary to the CNN-based system that models fundamental frequency information.

  • Details
  • Metrics
Type
conference paper
DOI
10.21437/Interspeech.2018-1696
Web of Science ID

WOS:000465363900234

Author(s)
Muckenhirn, Hannah
Magimai.-Doss, Mathew
Marcel, Sébastien
Date Issued

2018

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Published in
19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6
ISBN of the book

978-1-5108-7221-9

Series title/Series vol.

Interspeech

Start page

1116

End page

1120

Subjects

speaker verification

•

convolutional neural network

•

end-to-end learning

•

fundamental frequency

•

formants

URL

Related documents

http://publications.idiap.ch/downloads/papers/2018/Muckenhirn_INTERSPEECH_2018.pdf
Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent placeEvent date
Proceedings of Interspeech

Hyderabad, INDIA

Aug 02-Sep 06, 2018

Available on Infoscience
July 26, 2018
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/147548
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés