A high-performance distributed hash table for peer-to-peer information retrieval

Klemm, Fabius

doi:10.5075/epfl-thesis-4012

Klemm, Fabius

2008

Télécharger

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Résumé

This thesis describes our research results in the context of peer-to-peer information retrieval (P2P-IR). One goal in P2P-IR is to build a search engine for the World Wide Web (WWW) that runs on up to hundreds of thousands or even millions computers distributed all over the world. The idea is not only to distribute the content, e.g., web pages, but also an index for searching this content. The main focus of this thesis lies on designing an overlay network that is capable of transporting data between the different parts of such a distributed search engine. We built a Distributed Hash Table (DHT) that is able to sustain and efficiently handle high traffic loads, which are typically generated by a distributed IR application. We first analyze the behavior of a state-of-the-art DHT under heavy load and show that a DHT can suffer a so-called "congestion collapse" if it does not have a congestion control mechanism. We propose different ways of integrating congestion control into DHTs to achieve stable behavior of the system under heavy load. We then look into mechanisms for increasing the throughput of a DHT by adapting its routing function to perceived congestion. We propose an algorithm that avoids congested parts of the DHT and thus increases the throughput by exploiting underutilized resources. We evaluate our fully operational DHT prototype using a ModelNet cluster and the PlanetLab testbed to assess the performance of the proposed algorithms. Furthermore, we describe an architecture of a P2P search engine for the WWW. We propose mechanisms to create a highly distributed document index. The main idea is to split the index into very small parts by using so-called highly discriminative keys. We thus achieve an extremely distributed storage of the index, which allows for high parallelism during indexing and querying. We evaluate the performance of our indexing approach with a P2P-IR prototype, which is built on top of our high-performance DHT.

Détails

Titre A high-performance distributed hash table for peer-to-peer information retrieval

Auteur(s) Klemm, Fabius

Directeur(s)

Aberer, Karl

Pagination 128

Date 2008

Editeur Lausanne, EPFL

Mots-clés (libres)

Peer-to-Peer Information Retrieval; Distributed Hash Table; Congestion Control; moteur de recherche Peer-to-Peer; table de hachage distribuée; contrôle de congestion

Langue Anglais

DOI https://doi.org/10.5075/epfl-thesis-4012

Autres identifiant(s) urn: urn:nbn:ch:bel-epfl-thesis4012-3

Laboratoires LSIR

Le document apparaît dans Production scientifique et compétences > I&C - Faculté Informatique & Communications > IINFCOM > LSIR - Laboratoire de systèmes d'information répartis
Production scientifique et compétences > Thèses EPFL
Travail produit à l'EPFL
Publié
Thèses

Date de création de la notice 2007-12-13

Files

Résumé

Détails

PDF