Pseudo-Syntactic Language Modeling for Disfluent Speech Recognition

McGreevy, Michael

McGreevy, Michael

2004

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Language models for speech recognition are generally trained on text corpora. Since these corpora do not contain the disfluencies found in natural speech, there is a train/test mismatch when these models are applied to conversational speech. In this work we investigate a language model (LM) designed to model these disfluencies as a syntactic process. By modeling self-corrections we obtain an improvement over our baseline syntactic model. We also obtain a 30\% relative reduction in perplexity from the best performing standard {N-gram} model when we interpolate it with our syntactically derived models.

Details

Title Pseudo-Syntactic Language Modeling for Disfluent Speech Recognition

Author(s) McGreevy, Michael

Published in Proceedings of SST 2004 (10th Australian International Conference on Speech Science & Technology), Sydney, Australia, 2004

Conference Proceedings of SST 2004 (10th Australian International Conference on Speech Science & Technology), Sydney, Australia, 2004

Date 2004

Keywords

speech

Note IDIAP-RR 04-55

Additional link URL; Related documents

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL
Published

Record creation date 2006-03-10

Actions

Preview

Select file: