The gist of everything new: personalized top-k processing over web 2.0 streams

Web 2.0 portals have made content generation easier than ever with millions of users contributing news stories in form of posts in weblogs or short textual snippets as in Twitter. Efficient and effective filtering solutions are key to allow users stay tuned to this ever-growing ocean of information, releasing only relevant trickles of personal interest. In classical information filtering systems, user interests are formulated using standard IR techniques and data from all available information sources is filtered based on a predefined absolute quality-based threshold. In contrast to this restrictive approach which may still overwhelm the user with the returned stream of data, we envision a system which continuously keeps the user updated with only the top-k relevant new information. Freshness of data is guaranteed by considering it valid for a particular time interval, controlled by a sliding window. Considering relevance as relative to the existing pool of new information creates a highly dynamic setting. We present POL-filter which together with our maintenance module constitute an efficient solution to this kind of problem. We show by comprehensive performance evaluations using real world data, obtained from a weblog crawl, that our approach brings performance gains compared to state-of-the-art.

Published in:
Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010
Presented at:
19th ACM international conference on Information and knowledge management , Toronto, Ontario, Canada, October 26-30, 2010

 Record created 2011-01-18, last modified 2018-03-17

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)