Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

Dey, Subhadeep; Motlicek, Petr; Bui, Trung; Dernoncourt, Franck

doi:10.21437/Interspeech.2019-3246

conference paper

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

Dey, Subhadeep

•

Motlicek, Petr

•

Bui, Trung

more

2019

Proceedings of Interspeech 2019

In this paper, we explore various approaches for semi-
supervised learning in an end-to-end automatic speech recog-
nition (ASR) framework. The first step in our approach in-
volves training a seed model on the limited amount of labelled
data. Additional unlabelled speech data is employed through a
data-selection mechanism to obtain the best hypothesized out-
put, further used to retrain the seed model. However, uncer-
tainties of the model may not be well captured with a single
hypothesis. As opposed to this technique, we apply a dropout
mechanism to capture the uncertainty by obtaining multiple hy-
pothesized text transcripts of an speech recording. We assume
that the diversity of automatically generated transcripts for an
utterance will implicitly increase the reliability of the model.
Finally, the data-selection process is also applied on these hy-
pothesized transcripts to reduce the uncertainty. Experiments
on freely-available TEDLIUM corpus and proprietary Adobe’s
internal dataset show that the proposed approach significantly
reduces ASR errors, compared to the baseline model.

Type

conference paper

DOI

10.21437/Interspeech.2019-3246

Authors

Dey, Subhadeep

•

Motlicek, Petr

•

Bui, Trung

•

Dernoncourt, Franck

Publication date

2019

Published in

Proceedings of Interspeech 2019

Start page

734

End page

738

Written at

EPFL

EPFL units

LIDIAP

Available on Infoscience

September 5, 2019

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/160871