Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Domain Adaptation and Investigation of Robustness of DNN-based Embeddings for Text-Independent Speaker Verification Using Dilated Residual Networks
 
report

Domain Adaptation and Investigation of Robustness of DNN-based Embeddings for Text-Independent Speaker Verification Using Dilated Residual Networks

Sarfjoo, Seyyed Saeed
•
Magimai.-Doss, Mathew
•
Marcel, Sébastien
2019

Robustness of extracted embeddings in cross-database scenarios is one of the main challenges in text-independent speaker verification (SV) systems. In this paper, we investigate this robustness via performing structural cross-database experiments with or without additive noise. This noise can be added from the seen set, where the noise type is similar to the noise which is used in data augmentation for training the SV model, or unseen set, where distribution of additive noise in train and evaluation sets are different. For extracting the robust embeddings, we investigate applying the time dilation in the ResNet architecture, so-called dilated residual network (DRN). Dimension and number of segment level layers are tuned in this architecture. The proposed model with time dilation significantly outperformed the ResNet model and is comparable with the state-of-the-art SV systems on Voxceleb1 dataset. In addition, this architecture showed significant robustness in out of domain scenarios. Language mismatch is part of domain mismatch which recently is one of the main focuses of research in SV systems. Similar to image recognition field, we hypothesize that low-level convolutional neural network (CNN) layers are domain-specific features while high-level CNN layers are domain-independent and have more discriminative power. For adapting these domain-specific units, combination of triplet and intra-class losses are investigated. The adapted model on the evaluation part of the CMN2 dataset, relatively outperformed the DRN and x-vector SV systems without adaptation with 8.0 and 20.5 %, respectively in equal error-rate.

  • Details
  • Metrics
Type
report
Author(s)
Sarfjoo, Seyyed Saeed
Magimai.-Doss, Mathew
Marcel, Sébastien
Date Issued

2019

Publisher

Idiap

URL
http://publications.idiap.ch/downloads/reports/2019/Sarfjoo_Idiap-RR-10-2019.pdf
Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
November 7, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/162770
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés