Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Scalable peer-to-peer Web search using highly discriminative keys
 
doctoral thesis

Scalable peer-to-peer Web search using highly discriminative keys

Luu, Toan Vinh  
2007

Standard general-purpose Web retrieval relies on centralized search engines that do not realistically scale when applied to the exponentially growing number of documents available on the Web. By taking advantage of the resource sharing principle, Peer-to-Peer (P2P) techniques represent a promising architectural alternative for building decentralized Web search engines offering true Web-size scalability, provided that enough peers are available. However, in all such P2P approaches proposed so far, excessive network bandwidth consumption during retrieval, caused by the necessary transmission of possibly very long posting lists, was identified as the major bottleneck for implementing truly scalable P2P full-text Web retrieval. The main objective of the present research is thus to find a decentralized indexing/retrieval strategy that fully exploits the distributed computation possibilities provided by P2P networks, but keeps the required network bandwidth consumption scalable, while guaranteeing an acceptable retrieval quality. To address this problem we introduce a novel indexing/retrieval model based on Highly Discriminative Keys (HDKs), which correspond to carefully selected indexing terms and term sets associated with posting lists truncated to the top-k most relevant documents with respect to the associated key. Using HDKs for indexing thus increases the number of indexing features but, at the same time, strictly limits the size of the associated posting lists. When combined with an adequate retrieval model, this leads to strongly reduced network traffic. More precisely, our experimental results show that HDK-based indexing and retrieval lead to storage and bandwidth requirements that remain manageable with respect to the growth of document collection while preserving a fully acceptable retrieval quality.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-3945
Author(s)
Luu, Toan Vinh  
Advisors
Rajman, Martin  
Jury

Monika Henzinger, Jacques Savoy, Gerhard Weikum

Date Issued

2007

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2007-12-07

Thesis number

3945

Total of pages

124

Subjects

Distributed Information Retrieval

•

Highly Discriminative Keys

•

P2P System

•

Web Search Engine

•

moteurs de recherche

•

pair-à-pair

•

clés hautement discriminatives

•

système décentralisé

EPFL units
LIA  
Faculty
IC  
Section
IC-SIN  
School
IIF  
Doctoral School
EDHP
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/11948
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés