Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval
 
research article

Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval

Skobeltsyn, Gleb  
•
Luu, Toan
•
Podnar Zarko, Ivana
Show more
2009
Future Generation Computer Systems

In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1016/j.future.2008.03.006
Web of Science ID

WOS:000260238300010

Author(s)
Skobeltsyn, Gleb  
Luu, Toan
Podnar Zarko, Ivana
Rajman, Martin  
Aberer, Karl  
Date Issued

2009

Published in
Future Generation Computer Systems
Volume

25

Issue

1

Start page

89

End page

99

Subjects

P2P

•

DHT

•

IR

•

Text retrieval

•

P2PIR

•

Scalability

•

Query-driven indexing

•

Distributed index

•

Index updates

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSIR  
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/27896
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés