Files

Abstract

Selection and aggregation of ranking criteria became an important topic in information retrieval as search is getting more specialized and as volume of electronically available information grows. In this context, document ranking has undergone a shift from purely content-based ranking criteria to combined ranking schemes, integrating additional fea- tures, such as document popularity or impact, or link-based techniques that were widely applied after the success of the PageRank algorithm. In this thesis we experiment with selection and aggregation of ranking criteria, aiming at increasing performance of a specialized scientific search engine. Based on theoretical foundations across several research fields including social choice theory, information re- trieval and digital libraries, we focus on ranking in databases of scientific publications in the field of High Energy Physics, identifying criteria that are pertinent for ranking scientific publications and selecting and aggregating them within a unified framework. The first issue that we address is thus identification and selection of ranking criteria for scientific documents in High Energy Physics. The criteria include the traditional information retrieval relevance that is based on the word similarity, but also the document usage, citation counts and links. In this context we present a novel ranking criterion combining the Hirsch index and the download counts, that we call the d-Hirsch index, taking into account counts of document downloads and assigning the corresponding Hirsch index directly to a document. Criteria selection is then based on correlation analysis between ranked lists of documents. We propose that correlations of entire document listings should be replaced by measuring an overlap on the top-k of the resulting list that should better reflect the independence in terms of ranking. To this end we proposed a new measure for the overlap, the Mean Average Overlap (MeanO). The second issue that we address is the aggregation framework for ranked lists of documents, where we focus on applying linear combination and models trained with ma- chine learning techniques based on logistic regression. As individual scores that are used for ranking are not necessarily comparable to each other, we describe a unified model for normalizing ranking scores before their aggregation, based on statistical properties of the underlying ranking criteria. Another contribution of our work is related to creation of a referential of relevance judgments for information retrieval experimentation in databases of scientific publications in the domain of High Energy Physics. Until now there has been no such resource avail- able that would allow to carry out evaluation of specialized information retrieval in this domain. We propose a method for automated generation of referentials, assuming that document relevance is determined by the document usage. Our approach corresponds to a modification of the pooling method, where validating of relevance judgments is not done by experts in the domain, but is inferred from previous document usage as observed in past user behaviors. The developed ranking models, methods and algorithms were validated in the frame- work of the d-Rank prototype developed in collaboration between EPFL and CERN, aiming at the design of document retrieval mechanisms and interfaces combining a multi- tude of individual specialized ranking criteria. This prototype has been integrated within the CERN institutional repository of scientific publications. We carried out several experiments to support the thesis that rank aggregation pro- vides a superior performance for specialized information retrieval than ranking with single criteria. We experimented with the CERN document server document repository, as well as with artificially generated data showing that rank aggregation can contribute to increased performance of a specialized information retrieval system, as compared to a baseline of the best individually performing ranking criteria. We showed on an experiment done with artificially generated data that if scores of individual ranking criteria follow a comparable distribution skewed to the right, then ag- gregating such ranking criteria can lead to an improvement of the ranking performance in terms of the Mean Average Precision. We reproduced this result on an experiment carried out on a real-world data set created from the CERN database of scientific publications showing that aggregating freshness and the download frequency yielded better ranking results as compared to rankings done with the download frequency and freshness sepa- rately, based on the Mean Reciprocal Rank and S@10 evaluation measures. We observed that local aggregates obtained from documents that were part of the result set performed significantly better than global aggregates, obtained through involving all documents in a collection. These results were confirmed by a 10-fold randomized cross-validation test for both the Mean Reciprocal Rank and the S@10 evaluation measure. We compared performance of the d-Hirsch index with existing ranking criteria at the CERN document server and with aggregates obtained through linear combination of individual scores and through logistic regression. We did not observe significantly better ranking results when ranking documents with the d-Hirsch index or its aggregates as compared to other ranking criteria in the test set. We suggested that selecting ranking criteria based on a novel selection criterion, the mean average overlaps between ranked lists, provides a justified basis for criteria selection.

Details

Actions