Analysis of Language Dependent Front-End for Speaker Recognition

Motlicek, Petr

doi:10.21437/Interspeech.2018-2071

conference paper

Analysis of Language Dependent Front-End for Speaker Recognition

Madikeri, Srikanth

•

Dey, Subhadeep

•

Motlicek, Petr

January 1, 2018

19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6

19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018)

In Deep Neural Network (DNN) i-vector based speaker recognition systems, acoustic models trained for Automatic Speech Recognition are employed to estimate sufficient statistics for i-vector modeling. The DNN based acoustic model is typically trained on a wellresourced language like English. In evaluation conditions where enrollment and test data are not in English, as in the NIST SRE 2016 dataset, a DNN acoustic model generalizes poorly. In such conditions, a conventional Universal Background Model/Gaussian Mixture Model (UBM/GMM) based i-vector extractor performs better than the DNN based i-vector system. In this paper, we address the scenario in which one can develop a Automatic Speech Recognizer with limited resources for a language present in the evaluation condition, thus enabling the use of a DNN acoustic model instead of UBM/GMM. Experiments are performed on the Tagalog subset of the NIST SRE 2016 dataset assuming an open training condition. With a DNN i-vector system trained for Tagalog, a relative improvement of 12.1% is obtained over a baseline system trained for English.

Type

conference paper

DOI

10.21437/Interspeech.2018-2071

Web of Science ID

WOS:000465363900231

Author(s)

Madikeri, Srikanth

Dey, Subhadeep

Motlicek, Petr

Date Issued

2018-01-01

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Published in

19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6

ISBN of the book

978-1-5108-7221-9

Series title/Series vol.

Interspeech

Start page

1101

End page

1105

Subjects

Computer Science, Artificial Intelligence

•

Computer Science, Theory & Methods

•

Engineering, Electrical & Electronic

•

Computer Science

•

Engineering

•

i-vector

•

speaker recognition

•

deep neural networks

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LIDIAP

Event name	Event place	Event date
19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018)	Hyderabad, INDIA	Aug 02-Sep 06, 2018

Available on Infoscience

June 18, 2019

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/156868