Abstract

Document ranking for scientific publications involves a variety of specialized resources (e.g. author or citation indexes) that are usually difficult to use within standard general purpose search engines that usually operate on large-scale heterogeneous document collections for which the required specialized resources are not always available for all the documents present in the collections. Integrating such resources into specialized information retrieval engines is therefore important to cope with community-specific user expectations that strongly influence the perception of relevance within the considered community. In this perspective, this paper extends the notion of ranking with various methods exploiting different types of bibliographic knowledge that represent a crucial resource for measuring the relevance of scientific publications. In our work, we experimentally evaluated the adequacy of two such ranking methods (one based on freshness, i.e. the publication date, and the other on a novel index, the download-Hirsch index, based on download frequencies) for information retrieval from the CERN scientific publication database in the domain of particle physics. Our experiments show that (i) the considered specialized ranking methods indeed represent promising candidates for extending the base line ranking (relying on the download frequency), as they both lead to fairly small search result overlaps; and (ii) that extending the base line ranking with the specialized ranking method based on freshness significantly improves the quality of the retrieval: 16.2% of relative increase for the Mean Reciprocal Rank (resp. 5.1% of relative increase for the Success@10, i.e. the estimated probability of finding at least one relevant document among the top ten retrieved) when a local rank sum is used for aggregation. We plan to further validate the presented results by carrying out additional experiments wi Our experiments show that (i) the considered specialized ranking methods indeed represent promising candidates for extending the base line ranking (relying on the download frequency), as they both lead to fairly small search result overlaps; and (ii) that extending the base line ranking with the specialized ranking method based on freshness significantly improves the quality of the retrieval: 16.2% of relative increase for the Mean Reciprocal Rank (resp. 5.1% of relative increase for the Success@10, i.e. the estimated probability of finding at least one relevant document among the top ten retrieved) when a local rank sum is used for aggregation. We plan to further validate the presented results by carrying out additional experiments with the specialized ranking method based on the download-Hirsch index to further improve the performance of our aggregative approach.

Details

Actions