PLSI: The True Fisher Kernel and beyond IID Processes, Information Matrix and Model Identification in PLSI
The Probabilistic Latent Semantic indexing model, introduced by T. Hofmann (1999), has engendered applications ill numerous fields, notably document classification and information retrieval. In this context, the Fisher kernel was found to be an appropriate document similarity measure. However, the kernels published so far contain unjustified features, some of which hinder their performances. Furthermore, PLSI is not generative for unknown documents, a shortcoming usually remedied by "folding them in" the PLSI parameter space.