Utilisation de PLSI en recherche d'information

Chappelier, Jean-Cedric; Eckard, Emmanuel

Chappelier, Jean-Cedric; Eckard, Emmanuel

2009

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The PLSI model (“Probabilistic Latent Semantic Indexing”) offers a document indexing scheme based on probabilistic latent category models. It entailed applications in diverse ﬁelds, notably in information retrieval (IR). Nevertheless, PLSI cannot process documents not seen during parameter inference, a major liability for queries in IR. A method known as “folding-in” allows to circumvent this problem up to a point, but has its own weaknesses. The present paper introduces a new document-query similarity measure for PLSI based on language models that entirely avoids the problem a query projection. We compare this similarity to Fisher kernels, the state of the art similarities for PLSI. Moreover, we present an evaluation of PLSI on a particularly large training set of almost 7500 document and over one million term occurrence large, created from the TREC–AP collection.

Details

Title Utilisation de PLSI en recherche d'information

Author(s) Chappelier, Jean-Cedric ; Eckard, Emmanuel

Conference 16ème conference sur le Traitement Automatique des Langues Naturelles, Senlis, 24-26 June 2009

Date 2009

Keywords

Information retrieval; PLSI; Language modelling

Laboratories LIA

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LIA - Artificial Intelligence Laboratory
Work produced at EPFL
Published
Posters

Record creation date 2009-07-06

Actions

Preview

Select file: