Abstract

Excessive network bandwidth consumption, caused by the transmission of long posting lists, was identified as one of the major bottlenecks for implementing distributed full-text retrieval in a Peer-to-Peer (P2P) architecture. To address this problem we introduce a novel approach to indexing using highly discriminative terms and term sets, which leads to short posting lists and therefore reduces the network traffic by almost one order of magnitude. In addition, we show that retrieval based on discriminative term sets provides a retrieval quality comparable to standard full-text retrieval using TF-IDF ranking. Our indexing scheme is an important improvement towards realistic P2P retrieval systems that opens the opportunity to virtually unlimited scalability well beyond the capacity of today's best centralized Web search engines.

Details

Actions