Distributed Cache Table: Efficient Query-Driven Processing of Multi-Term Queries in P2P Networks

Skobeltsyn, Gleb; Karl, Aberer

2006

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The state-of-the-art techniques for processing multi-term queries in P2P environments are query flooding and inverted list intersection. However, it has been shown that due to scalability reasons both methods fail to support full-text search in large scale document collections distributed among the nodes in a P2P network. Although a number of optimizations have been suggested recently based on the aforementioned techniques, little evidence is given on their scalability. In this paper we suggest a novel query-driven indexing strategy which generates and maintains only those index entries that are actually used for query processing. In our approach called Distributed Cache Table (DCT), by analogy with Distributed Hash Table (DHT), we suggest to abandon the difference between data indexing and query caching, and to store result sets (caches) for the most profitable queries. DCT employs a distributed index to efficiently locate caches that can answer a given multi-term query and broadcasts the query to all the peers only if no such caches were found. Evaluations on real data and query loads show that DCT converges to a high cache-hit ratio and indeed offers a large-scale distributed solution for storing and efficient querying of vast amounts of documents in the P2P setting. DCT achieves two orders of magnitude improvement in traffic consumption compared to a standard distributed single-term indexing approach.

Details

Title Distributed Cache Table: Efficient Query-Driven Processing of Multi-Term Queries in P2P Networks

Author(s) Skobeltsyn, Gleb ; Karl, Aberer

Date 2006

Keywords

P2P DTH query-driven indexing caching multi-term query processing

Laboratories LSIR

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LSIR - Distributed Information Systems Laboratory
Work produced at EPFL
Technical Reports
Published

Record creation date 2006-07-18

Actions

Preview

Select file: