Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features

Valente, Fabio; Magimai.-Doss, Mathew; Plahl, Christian; Ravuri, Suman; Wang, Wen

doi:10.1109/TASL.2011.2139206

Valente, Fabio; Magimai.-Doss, Mathew; Plahl, Christian; Ravuri, Suman; Wang, Wen

2011

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Recently, several multi-layer perceptron (MLP)- based front-ends have been developed and used for Mandarin speech recognition, often showing significant complementary properties to conventional spectral features. Although widely used in multiple Mandarin systems, no systematic comparison of all the different approaches as well as their scalability has been proposed. The novelty of this correspondence is mainly experimental. In this work, all the MLP front-ends recently developed at multiple sites are described and compared in a systematic manner on a 100 hours setup. The study covers the two main directions along which the MLP features have evolved: the use of different input representations to the MLP and the use of more complex MLP architectures beyond the three-layer perceptron. The results are analyzed in terms of confusion matrices and the paper discusses a number of novel findings that the comparison reveals. Furthermore, the two best front-ends used in the GALE 2008 evaluation, referred as MLP1 and MLP2, are studied in a more complex LVCSR system in order to investigate their scalability in terms of the amount of training data (from 100 hours to 1600 hours) and the parametric system complexity (maximum likelihood versus discriminative training, speaker adaptative training, lattice level combination). Results on 5 hours of evaluation data from the GALE project reveal that the MLP features consistently produce improvements in the range of 15%–23% relative at the different steps of a multipass system when compared to mel-frequency cepstral coefficient (MFCC) and PLP features, suggesting that the improvements scale with the amount of data and with the complexity of the system. The integration of those features into the GALE 2008 evaluation system provide very competitive performances compared to other Mandarin systems. © 2011, IEEE.

Détails

Titre Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features

Auteur(s) Valente, Fabio ; Magimai.-Doss, Mathew ; Plahl, Christian ; Ravuri, Suman ; Wang, Wen

Publié dans IEEE Transactions on Audio, Speech, and Language Processing

Volume 19

Numéro 8

Pages 2439-2450

Date 2011

DOI https://doi.org/10.1109/TASL.2011.2139206

Laboratoires LIDIAP

Le document apparaît dans Production scientifique et compétences > STI - Faculté des sciences et techniques de l'ingénieur > IEM - Institute of Electrical and Micro Engineering > LIDIAP - Laboratoire de l'IDIAP
Production scientifique et compétences > Euler Center for Signal Processing
Publications validées par des pairs
Travail produit à l'EPFL
Articles de journaux
Publié

Date de création de la notice 2013-12-19

Résumé

Détails

Actions