Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval
 
conference paper

Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval

Skobeltsyn, Gleb  
•
Luu, Toan
•
Podnar Žarko, Ivana
Show more
2009
Future Generation Computer Systems-The International Journal Of Grid Computing Theory Methods And Applications
Infoscale: the Second International Conference on Scalable Information Systems

We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been identified as the major problem for the standard P2P approach with single term indexing, we leverage a distributed index that stores up to top-k document references only for carefully chosen indexing term combinations. In addition, since the number of possible term combinations extracted from a document collection can be very large, we propose to use query statistics to index only such combinations that are indeed frequently requested by the users. Thus, by avoiding the maintenance of superfluous indexing information, we achieve a substantial reduction in bandwidth and storage. A specific activation mechanism is applied to continuously update the indexing information according to changes in the query distribution, resulting in an efficient, constantly evolving query-driven indexing structure. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval. Moreover, our experiments confirm that the retrieval performance is only slightly lower than the one obtained with state-of-the-art centralized query engines.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1016/j.future.2008.03.006
Web of Science ID

WOS:000260238300010

Author(s)
Skobeltsyn, Gleb  
Luu, Toan
Podnar Žarko, Ivana
Rajman, Martin  
Aberer, Karl  
Date Issued

2009

Published in
Future Generation Computer Systems-The International Journal Of Grid Computing Theory Methods And Applications
Volume

25

Start page

89

End page

99

Subjects

P2P

•

DHT

•

IR

•

Query-Driven Indexing

•

Scalability

URL

URL

http://www.infoscale2007.org/
Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSIR  
Event nameEvent placeEvent date
Infoscale: the Second International Conference on Scalable Information Systems

Suzhou, China

June 6-8, 2007

Available on Infoscience
May 2, 2007
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/6607
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés