Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Scalable peer-to-peer Web search using highly discriminative keys
 
Loading...
Thumbnail Image
doctoral thesis

Scalable peer-to-peer Web search using highly discriminative keys

Luu, Toan Vinh  
2007

Standard general-purpose Web retrieval relies on centralized search engines that do not realistically scale when applied to the exponentially growing number of documents available on the Web. By taking advantage of the resource sharing principle, Peer-to-Peer (P2P) techniques represent a promising architectural alternative for building decentralized Web search engines offering true Web-size scalability, provided that enough peers are available. However, in all such P2P approaches proposed so far, excessive network bandwidth consumption during retrieval, caused by the necessary transmission of possibly very long posting lists, was identified as the major bottleneck for implementing truly scalable P2P full-text Web retrieval. The main objective of the present research is thus to find a decentralized indexing/retrieval strategy that fully exploits the distributed computation possibilities provided by P2P networks, but keeps the required network bandwidth consumption scalable, while guaranteeing an acceptable retrieval quality. To address this problem we introduce a novel indexing/retrieval model based on Highly Discriminative Keys (HDKs), which correspond to carefully selected indexing terms and term sets associated with posting lists truncated to the top-k most relevant documents with respect to the associated key. Using HDKs for indexing thus increases the number of indexing features but, at the same time, strictly limits the size of the associated posting lists. When combined with an adequate retrieval model, this leads to strongly reduced network traffic. More precisely, our experimental results show that HDK-based indexing and retrieval lead to storage and bandwidth requirements that remain manageable with respect to the growth of document collection while preserving a fully acceptable retrieval quality.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH3945.pdf

Access type

restricted

Size

1.49 MB

Format

Adobe PDF

Checksum (MD5)

200b9b2b35853a8d356d3a11dbcfb7eb

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés