Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Bourlard, Herve

doi:10.21437/Interspeech.2021-1683

conference paper

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Vyas, Apoorv

•

Madikeri, Srikanth

•

Bourlard, Herve

January 1, 2021

Interspeech 2021

Interspeech Conference

In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on three different datasets including outof-domain (Switchboard) and cross-lingual (Babel) scenarios. Our results show that for supervised adaptation of the wav2vec 2.0 model, both E2E-LFMMI and CTC achieve similar results; significantly outperforming the baselines trained only with supervised data. Fine-tuning the wav2vec 2.0 model with E2ELFMMI and CTC we obtain the following relative WER improvements over the supervised baseline trained with E2ELFMMI. We get relative improvements of 40% and 44% on the clean-set and 64% and 58% on the test set of Librispeech (100h) respectively. On Switchboard (300h) we obtain relative improvements of 33% and 35% respectively. Finally, for Babel languages, we obtain relative improvements of 26% and 23% on Swahili (38h) and 18% and 17% on Tagalog (84h) respectively.

Type

conference paper

DOI

10.21437/Interspeech.2021-1683

Web of Science ID

WOS:000841879502193

Author(s)

Vyas, Apoorv

Madikeri, Srikanth

Bourlard, Herve

Date Issued

2021-01-01

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Published in

Interspeech 2021

Series title/Series vol.

Interspeech

Start page

2861

End page

2865

Subjects

speech recognition

•

wav2vec 2.0

•

e2e-lfmmi

•

ctc

•

cross-lingual adaptation

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LIDIAP

Event name	Event place	Event date
Interspeech Conference	Brno, CZECH REPUBLIC	Aug 30-Sep 03, 2021

Available on Infoscience

September 26, 2022

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/190935