Utilisation de PLSI en recherche d'information

The PLSI model (“Probabilistic Latent Semantic Indexing”) offers a document indexing scheme based on probabilistic latent category models. It entailed applications in diverse fields, notably in information retrieval (IR). Nevertheless, PLSI cannot process documents not seen during parameter inference, a major liability for queries in IR. A method known as “folding-in” allows to circumvent this problem up to a point, but has its own weaknesses. The present paper introduces a new document-query similarity measure for PLSI based on language models that entirely avoids the problem a query projection. We compare this similarity to Fisher kernels, the state of the art similarities for PLSI. Moreover, we present an evaluation of PLSI on a particularly large training set of almost 7500 document and over one million term occurrence large, created from the TREC–AP collection.


Presented at:
16ème conference sur le Traitement Automatique des Langues Naturelles, Senlis, 24-26 June 2009
Year:
2009
Keywords:
Laboratories:




 Record created 2009-07-06, last modified 2018-09-13

n/a:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)