Scalable Delivery of Stream Query Result
Continuous queries over data streams typically produce large volumes of result streams. To scale up the system, one should carefully study the problem of delivering the result streams to the end users, which, unfortunately, is often over-looked in existing systems. In this paper, we leverage Distributed Publish/Subscribe System (DPSS), a scalable data dissemination infrastructure, for efficient stream query result delivery. To take advantage of DPSS's multicast-like data dissemination architecture, one has to exploit the common contents among different result streams and maximize the sharing of their delivery. Hence, we propose to merge the user queries into a few representative queries whose results subsume those of the original ones, and disseminate the result streams of these representative queries through the DPSS. To realize this approach, we study the stream query containment theories and propose efficient query grouping and merging algorithms. The proposed approach is non-intrusive and hence can be easily implemented as a middleware to be incorporated into existing stream processing systems. A prototype is developed on top of an open- source stream processing system and results of an extensive performance study on real datasets verify the efficacy of the proposed techniques.