Building a peer-to-peer full-text Web search engine with highly discriminative keys
Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low maintenance costs are favorable properties of P2P overlays in the perspective of large-scale search, but we also face new problems due to potentially huge bandwidth consumption during both indexing and querying, as well as the unavailability of global document collection statistics. Since a straightforward application of P2P solutions for Web search generates unscalable indexing and search traffic, we propose a novel indexing technique which maintains a global key index in structured P2P overlays. Keys are highly-discriminative terms and term sets that appear in a restricted number of collection documents, thus limiting the size of the global index, while ensuring scalable search cost. Our experimental results show reasonable indexing costs while the retrieval quality is comparable to standard centralized solutions with TF-IDF ranking. Our indexing scheme represents a contribution toward realistic P2P Web search engines that opens the opportunity to virtually unlimited resources, well beyond the capacity of today's best centralized Web search engines.