Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. A high-performance distributed hash table for peer-to-peer information retrieval
 
doctoral thesis

A high-performance distributed hash table for peer-to-peer information retrieval

Klemm, Fabius  
2008

This thesis describes our research results in the context of peer-to-peer information retrieval (P2P-IR). One goal in P2P-IR is to build a search engine for the World Wide Web (WWW) that runs on up to hundreds of thousands or even millions computers distributed all over the world. The idea is not only to distribute the content, e.g., web pages, but also an index for searching this content. The main focus of this thesis lies on designing an overlay network that is capable of transporting data between the different parts of such a distributed search engine. We built a Distributed Hash Table (DHT) that is able to sustain and efficiently handle high traffic loads, which are typically generated by a distributed IR application. We first analyze the behavior of a state-of-the-art DHT under heavy load and show that a DHT can suffer a so-called "congestion collapse" if it does not have a congestion control mechanism. We propose different ways of integrating congestion control into DHTs to achieve stable behavior of the system under heavy load. We then look into mechanisms for increasing the throughput of a DHT by adapting its routing function to perceived congestion. We propose an algorithm that avoids congested parts of the DHT and thus increases the throughput by exploiting underutilized resources. We evaluate our fully operational DHT prototype using a ModelNet cluster and the PlanetLab testbed to assess the performance of the proposed algorithms. Furthermore, we describe an architecture of a P2P search engine for the WWW. We propose mechanisms to create a highly distributed document index. The main idea is to split the index into very small parts by using so-called highly discriminative keys. We thus achieve an extremely distributed storage of the index, which allows for high parallelism during indexing and querying. We evaluate the performance of our indexing approach with a P2P-IR prototype, which is built on top of our high-performance DHT.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-4012
Author(s)
Klemm, Fabius  
Advisors
Aberer, Karl  
Jury

Ernst Biersack, Anja Feldmann, Jean-Yves Le Boudec

Date Issued

2008

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2008-02-01

Thesis number

4012

Total of pages

128

Subjects

Peer-to-Peer Information Retrieval

•

Distributed Hash Table

•

Congestion Control

•

moteur de recherche Peer-to-Peer

•

table de hachage distribuée

•

contrôle de congestion

EPFL units
LSIR  
Faculty
IC  
Section
IC-SSC  
School
IIF  
Doctoral School
EDHP
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/15561
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés