Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w
Existing content-based publish/subscribe systems are designed assuming that all matching publications are equally relevant to a subscription. As we cannot know in advance the distribution of publication content, the following two unwanted situations are highly possible: a subscriber either receives too many or only few publications. In this paper we present a new publish/subscribe model which is based on the sliding window computation model. Our model assumes that publications have different relevance to a subscription. In the model, a subscriber receives k most relevant publications published within a time window w, where k and w are parameters defined per each subscription. As a consequence, the arrival rate of incoming relevant publications per subscription is predefined, and does not depend on the publication rate. Since all relevant objects (i.e. publications in our case) cannot be kept in main memory, existing solutions immediately discard less relevant objects, and store only a small representative set for subsequent delivery. In this paper we develop a probabilistic criterion to decide upon the arrival of a new object whether it may become the top-k object at some future point in time and should thus be stored in a special publications queue. We show that by accepting typically very small probability of error, the queue length is reasonably small and does not significantly depend on publication rate. Furthermore, we experimentally evaluate our approach to demonstrate its applicability in practice.