Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model
 
conference paper

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Vyas, Apoorv
•
Madikeri, Srikanth
•
Bourlard, Herve  
January 1, 2021
Interspeech 2021
Interspeech Conference

In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on three different datasets including outof-domain (Switchboard) and cross-lingual (Babel) scenarios. Our results show that for supervised adaptation of the wav2vec 2.0 model, both E2E-LFMMI and CTC achieve similar results; significantly outperforming the baselines trained only with supervised data. Fine-tuning the wav2vec 2.0 model with E2ELFMMI and CTC we obtain the following relative WER improvements over the supervised baseline trained with E2ELFMMI. We get relative improvements of 40% and 44% on the clean-set and 64% and 58% on the test set of Librispeech (100h) respectively. On Switchboard (300h) we obtain relative improvements of 33% and 35% respectively. Finally, for Babel languages, we obtain relative improvements of 26% and 23% on Swahili (38h) and 18% and 17% on Tagalog (84h) respectively.

  • Details
  • Metrics
Type
conference paper
DOI
10.21437/Interspeech.2021-1683
Web of Science ID

WOS:000841879502193

Author(s)
Vyas, Apoorv
Madikeri, Srikanth
Bourlard, Herve  
Date Issued

2021-01-01

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Published in
Interspeech 2021
Series title/Series vol.

Interspeech

Start page

2861

End page

2865

Subjects

speech recognition

•

wav2vec 2.0

•

e2e-lfmmi

•

ctc

•

cross-lingual adaptation

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent placeEvent date
Interspeech Conference

Brno, CZECH REPUBLIC

Aug 30-Sep 03, 2021

Available on Infoscience
September 26, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/190935
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés