Noisy Text Clustering

Grangier, David; Vinciarelli, Alessandro

Grangier, David; Vinciarelli, Alessandro

2004

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This work presents document clustering experiments performed over noisy texts (i.e. text that have been extracted through an automatic process like speech or character recognition). The effect of recognition errors on different clustering techniques is measured through the comparison of the results obtained with clean (manually typed texts) and noisy (automatic speech transcripts affected by $30\%$ Word Error Rate) versions of the TDT2 corpus ($\sim600$ hours of spoken data from broadcast news). The results suggest that clustering can be performed over noisy data with an acceptable performance degradation.

Details

Title Noisy Text Clustering

Author(s) Grangier, David ; Vinciarelli, Alessandro

Date 2004

Publisher IDIAP

Keywords

Speech

Additional link URL

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Work produced at EPFL
Technical Reports
Published

Record creation date 2006-03-10

Files

Abstract

Details

PDF