An Ad Hoc Information Retrieval Perspective on PLSI through language model identification

Ten years ago, PLSI opened the road to probabilistic latent semantic representations of documents. It led to a number of applications in different fields, including ad hoc Information Retrieval. However, inherent limitations hinder its use on documents not seen during learning. This paper proposes a new document–query similarity for PLSI based on language modeling that allows queries to be used in PLSI without the usual folding-in phase. We compare this similarity to Fisher kernels, the state-of-the-art approach for PLSI. In this perspective, we complete the study of the impact of the Fisher Information Matrix, and of how latent-topics and word components contribute to the kernel performance. We furthermore present an evaluation of PLSI with learning performed on a corpus of over one million word occurrences, coming from the TREC–AP evaluation collection, a particularly large corpus for parameter estimation in the PLSI framework.

Presented at:
2nd International Conference on the Theory of Information Retrieval, Cambridge, 10-12 September 2009
Springer-Verlag New York, Ms Ingrid Cunningham, 175 Fifth Ave, New York, Ny 10010 Usa

 Record created 2009-07-06, last modified 2018-03-17

Rate this document:

Rate this document:
(Not yet reviewed)