000095721 001__ 95721
000095721 005__ 20180317094509.0
000095721 037__ $$aREP_WORK
000095721 245__ $$aQuery-Driven Indexing for Peer-to-Peer Text Retrieval
000095721 269__ $$a2006
000095721 260__ $$c2006
000095721 336__ $$aReports
000095721 520__ $$aWe present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with the bandwidth consumption problem that has been identified as the major issue for the standard distributed approach using single term indexing, we leverage a distributed index that stores top-k document references only for carefully chosen indexing term combinations. In addition, since the number of possible term combinations extracted from a document collection can still be very large, we propose to use query statistics to index only such combination that are indeed frequently requested by the users. Thus, by avoiding the maintenance of superfluous indexing information, we achieve a substantial reduction in bandwidth and storage. A specific activation mechanism is applied to take into account changes in the query distribution, resulting in an efficient, constantly evolving query driven indexing structure. Moreover, our approach facilitates adjusting the indexing load according to the resources provided by the peers in the network.   We claim that the size of the index and the generated indexing/retrieval traffic remains manageable even for web-size document collections at a price of a marginal loss in recall for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval. Furthermore, our experiments confirm that the retrieval performance is only slightly lower than the one obtained with state-of-the-art, centralized query engines, such as Google or Yahoo.
000095721 6531_ $$aP2P
000095721 6531_ $$aIR
000095721 6531_ $$aDHT
000095721 6531_ $$aInformation Retrieval
000095721 6531_ $$aIndexing
000095721 700__ $$0240003$$aSkobeltsyn, Gleb$$g155081
000095721 700__ $$aLuu, Toan
000095721 700__ $$aPodnar, Ivana
000095721 700__ $$0240004$$aRajman, Martin$$g112354
000095721 700__ $$0240941$$aAberer, Karl$$g134136
000095721 8564_ $$s297201$$uhttps://infoscience.epfl.ch/record/95721/files/Query-Driven%20Indexing%20for%20Peer-to-Peer%20Text%20Retrieval.pdf$$zn/a
000095721 909CO $$ooai:infoscience.tind.io:95721$$preport$$pIC
000095721 909C0 $$0252004$$pLSIR$$xU10405
000095721 937__ $$aLSIR-REPORT-2006-014
000095721 973__ $$aEPFL$$sPUBLISHED
000095721 980__ $$aREPORT