Stream fusion for multi-stream automatic speech recognition

Sagha, Hesam; Li, Feipeng; Variani, Ehsan; Millan, Jose Del R.; Chavarriaga, Ricardo; Schuller, Bjoern

doi:10.1007/s10772-016-9357-1

Sagha, Hesam; Li, Feipeng; Variani, Ehsan; Millan, Jose Del R.; Chavarriaga, Ricardo; Schuller, Bjoern

2016

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Multi-stream automatic speech recognition (MS-ASR) has been confirmed to boost the recognition performance in noisy conditions. In this system, the generation and the fusion of the streams are the essential parts and need to be designed in such a way to reduce the effect of noise on the final decision. This paper shows how to improve the performance of the MS-ASR by targeting two questions; (1) How many streams are to be combined, and (2) how to combine them. First, we propose a novel approach based on stream reliability to select the number of streams to be fused. Second, a fusion method based on Parallel Hidden Markov Models is introduced. Applying the method on two datasets (TIMIT and RATS) with different noises, we show an improvement of MS-ASR.

Details

Title Stream fusion for multi-stream automatic speech recognition

Author(s) Sagha, Hesam ; Li, Feipeng ; Variani, Ehsan ; Millan, Jose Del R. ; Chavarriaga, Ricardo ; Schuller, Bjoern

Published in International Journal Of Speech Technology

Pagination 7

Volume 19

Issue 4

Pages 669-675

Date 2016

Publisher New York, Springer Verlag

ISSN 1381-2416

Keywords

Multi-stream speech recognition; Performance monitor; Classifier ensemble creation and fusion

DOI https://doi.org/10.1007/s10772-016-9357-1

Other identifier(s) View record in Web of Science

Laboratories CNBI

Record Appears in Scientific production and competences > STI - School of Engineering > STI Archives > CNBI - Defitech Foundation Chair in Brain-Machine Interface
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2017-01-24

Abstract

Details

Actions