Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys

Podnar, Ivana; Rajman, Martin; Luu, Toan; Klemm, Fabius; Aberer, Karl

Podnar, Ivana; Rajman, Martin; Luu, Toan; Klemm, Fabius; Aberer, Karl

2006

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The suitability of Peer-to-Peer (P2P) approaches for full-text web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we present a novel indexing/retrieval model that achieves high performance, cost-efficient retrieval by indexing with \emph{highly discriminative keys (HDKs)} stored in a distributed global index maintained in a structured P2P network. HDKs correspond to carefully selected terms and term sets appearing in small numbers of collection documents. We provide a theoretical analysis of the scalability of our retrieval model and report experimental results obtained with our HDK-based P2P retrieval engine. These results show that, despite increased indexing costs, the total traffic generated with the HDK approach is significantly smaller than the one obtained with distributed single-term indexing strategies. Furthermore, our experiments show that the retrieval performance obtained with a random set of real queries is comparable to the one of centralized, single-term solution using the best state-of-the-art BM25 relevance computation scheme. Finally, our scalability analysis demonstrates that the HDK approach can scale to large networks of peers indexing web-size document collections, thus opening the way towards viable, truly-decentralized web retrieval.

Details

Title Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys

Author(s) Podnar, Ivana ; Rajman, Martin ; Luu, Toan ; Klemm, Fabius ; Aberer, Karl

Date 2006

Keywords

peer-to-peer information systems; distributed information retrieval; scalability

Other identifier(s) View record in Web of Science

Laboratories LSIR

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LSIR - Distributed Information Systems Laboratory
Work produced at EPFL
Technical Reports
Published

Record creation date 2006-07-17

Actions

Preview

Select file: