A high-performance distributed hash table for peer-to-peer information retrieval

Klemm, Fabius

doi:10.5075/epfl-thesis-4012

doctoral thesis

A high-performance distributed hash table for peer-to-peer information retrieval

2008

This thesis describes our research results in the context of peer-to-peer information retrieval (P2P-IR). One goal in P2P-IR is to build a search engine for the World Wide Web (WWW) that runs on up to hundreds of thousands or even millions computers distributed all over the world. The idea is not only to distribute the content, e.g., web pages, but also an index for searching this content. The main focus of this thesis lies on designing an overlay network that is capable of transporting data between the different parts of such a distributed search engine. We built a Distributed Hash Table (DHT) that is able to sustain and efficiently handle high traffic loads, which are typically generated by a distributed IR application. We first analyze the behavior of a state-of-the-art DHT under heavy load and show that a DHT can suffer a so-called "congestion collapse" if it does not have a congestion control mechanism. We propose different ways of integrating congestion control into DHTs to achieve stable behavior of the system under heavy load. We then look into mechanisms for increasing the throughput of a DHT by adapting its routing function to perceived congestion. We propose an algorithm that avoids congested parts of the DHT and thus increases the throughput by exploiting underutilized resources. We evaluate our fully operational DHT prototype using a ModelNet cluster and the PlanetLab testbed to assess the performance of the proposed algorithms. Furthermore, we describe an architecture of a P2P search engine for the WWW. We propose mechanisms to create a highly distributed document index. The main idea is to split the index into very small parts by using so-called highly discriminative keys. We thus achieve an extremely distributed storage of the index, which allows for high parallelism during indexing and querying. We evaluate the performance of our indexing approach with a P2P-IR prototype, which is built on top of our high-performance DHT.

Name

EPFL_TH4012.pdf

Access type

restricted

Size

1.17 MB

Format

Adobe PDF

Checksum (MD5)

a3640ae38d3131b87f16c78e29052850