Infoscience

Poster

An Ad Hoc Information Retrieval Perspective on PLSI through language model identification

Ten years ago, PLSI opened the road to probabilistic latent semantic representations of documents. It led to a number of applications in different fields, including ad hoc Information Retrieval. However, inherent limitations hinder its use on documents not seen during learning. This paper proposes a new document–query similarity for PLSI based on language modeling that allows queries to be used in PLSI without the usual folding-in phase. We compare this similarity to Fisher kernels, the state-of-the-art approach for PLSI. In this perspective, we complete the study of the impact of the Fisher Information Matrix, and of how latent-topics and word components contribute to the kernel performance. We furthermore present an evaluation of PLSI with learning performed on a corpus of over one million word occurrences, coming from the TREC–AP evaluation collection, a particularly large corpus for parameter estimation in the PLSI framework.

Fulltext

  • There is no available fulltext. Please contact the lab or the authors.

Related material