Fast latent semantic indexing of spoken documents by using self-organizing maps

Kurimo, Mikko

doi:10.1109/ICASSP.2000.859331

Kurimo, Mikko

2000

Télécharger

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Fichiers

Résumé

This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is indexing broadcast news from radio and TV as a combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) and information retrieval (IR). For indexing, the documents are presented as vectors of word counts, whose dimensionality is rapidly reduced by random mapping (RM). The obtained vectors are projected into the latent semantic subspace determined by SVD, where the vectors are then smoothed by a self-organizing map (SOM). The smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER). As the clusters in the semantic subspace reflect the news topics, the SOMs provide an easy way to visualize the index and query results and to explore the database. Test results are reported for TREC's spoken document retrieval databases.

Détails

Titre Fast latent semantic indexing of spoken documents by using self-organizing maps

Auteur(s) Kurimo, Mikko

Publié dans Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP'2000

Volume 4

Pages 2425-2428

Présenté à IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP'2000, Istanbul, Turkey

Date 2000

Mots-clés (libres)

speech

Note IDIAP-RR 99-20

DOI https://doi.org/10.1109/ICASSP.2000.859331

Lien supplémentaire URL; Related documents

Laboratoires LIDIAP

Le document apparaît dans Production scientifique et compétences > STI - Faculté des sciences et techniques de l'ingénieur > IEM - Institute of Electrical and Micro Engineering > LIDIAP - Laboratoire de l'IDIAP
Production scientifique et compétences > Euler Center for Signal Processing
Papiers de conférence
Travail produit à l'EPFL
Publié

Date de création de la notice 2006-03-10

Actions

Aperçu

Sélectionner le fichier :