Utilisation de PLSI en recherche d'information

Chappelier, Jean-CedricEckard, Emmanuel2009-07-062009-07-062009-07-062009https://infoscience.epfl.ch/handle/20.500.14299/41083The PLSI model (“Probabilistic Latent Semantic Indexing”) offers a document indexing scheme based on probabilistic latent category models. It entailed applications in diverse ﬁelds, notably in information retrieval (IR). Nevertheless, PLSI cannot process documents not seen during parameter inference, a major liability for queries in IR. A method known as “folding-in” allows to circumvent this problem up to a point, but has its own weaknesses. The present paper introduces a new document-query similarity measure for PLSI based on language models that entirely avoids the problem a query projection. We compare this similarity to Fisher kernels, the state of the art similarities for PLSI. Moreover, we present an evaluation of PLSI on a particularly large training set of almost 7500 document and over one million term occurrence large, created from the TREC–AP collection.Information retrievalPLSILanguage modellingUtilisation de PLSI en recherche d'informationtext::conference output::conference poster not in proceedings