Using SiteRank for P2P Web Retrieval
Studies of the Web graph at the granularity of documents have revealed many interesting link distributions. Similarly, studies of the Web graph at the granularity of Web sites, the so-called hostgraph, revealed relationships among hosts based on linkage and co-citation. However, to the best of our knowledge, the graph of Web sites has not been exploited for the purpose of ranking in search engines. In this paper, we first identify the necessity of a SiteGraph abstraction. We derive the SiteRank, a ranking of general importance among the Web sites in such a graph. We then show that SiteRank follows a power-law distribution. As experimental data set we were using the Web of our campus with over two million documents. We uncover interesting relationships between PageRank and SiteRank. Based on these results and observations, we conclude that the decomposition of global Web document ranking computation by making use of SiteRank is a very promising approach for computing global document rankings in a decentralized P2P search system. In particular, by sharing SiteRanks peers would not only be able to efficiently compute global document rankings in a decentralized manner, but also obtain a new means to fight link spamming. Our experiments give very promising results to back up the proposed ideas.