An Ad Hoc Information Retrieval Perspective on PLSI through language model identification

Chappelier, Jean-Cedric; Eckard, Emmanuel

doi:10.1007/978-3-642-04417-5_36

conference paper

An Ad Hoc Information Retrieval Perspective on PLSI through language model identification

Chappelier, Jean-Cedric

•

Eckard, Emmanuel

2009

Advances in Information Retrieval Theory

2nd International Conference on the Theory of Information Retrieval

Ten years ago, PLSI opened the road to probabilistic latent semantic representations of documents. It led to a number of applications in different ﬁelds, including ad hoc Information Retrieval. However, inherent limitations hinder its use on documents not seen during learning. This paper proposes a new document–query similarity for PLSI based on language modeling that allows queries to be used in PLSI without the usual folding-in phase. We compare this similarity to Fisher kernels, the state-of-the-art approach for PLSI. In this perspective, we complete the study of the impact of the Fisher Information Matrix, and of how latent-topics and word components contribute to the kernel performance. We furthermore present an evaluation of PLSI with learning performed on a corpus of over one million word occurrences, coming from the TREC–AP evaluation collection, a particularly large corpus for parameter estimation in the PLSI framework.

Type

conference paper

DOI

10.1007/978-3-642-04417-5_36

Web of Science ID

WOS:000271806000035

Authors

Chappelier, Jean-Cedric

•

Eckard, Emmanuel

Publication date

2009

Publisher

Springer-Verlag New York, Ms Ingrid Cunningham, 175 Fifth Ave, New York, Ny 10010 Usa

Published in

Advances in Information Retrieval Theory

Series title/Series vol.

Lecture Notes in Computer Science

Start page

346

End page

349

Subjects

PLSI

Information retrieval...

Language modelling

EPFL units

LIA

Event name	Event place	Event date
2nd International Conference on the Theory of Information Retrieval	Cambridge	10-12 September 2009

Available on Infoscience

July 6, 2009

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/41085