Infoscience

Conference paper

Query Optimization in Context of Pseudo Relevant Documents

In conventional vector space model for information retrieval, query vector generation is imperfect for retrieval of precise documents which are de-sired by user. In this paper, we present a stochastic based approach for optimiz-ing query vector without user involvement. We explore the document search space using particle swarm optimization and exploit the search space of possi-ble relevant and non-relevant documents for adaption of query vector. Proposed method improves the retrieval accuracy by optimizing the query vector which is generated in conventional vector space model based on various term weighting strategies including TF-IDF and document length normalization. Our experi-mental result on two collections Medline and Cranfield shows that adapted query vector in pseudo relevant document performs better over the classical vector space model. We achieved improvement of 3-4% in Mean Average Pre-cision (MAP) and 5-10% in Precision at lower recall. Further expansion of search space in pseudo non-relevant documents does not lead to significant im-provement, but proper representation of pseudo non-relevant document leaves a scope in future to guide the better optimization of query vector.

Related material

Contacts

EPFL authors