Efficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams
Minimizing communication cost is a fundamental problem in large-scale federated sensor networks. Existing solutions applicable for the problem are often ad-hoc for specific query types, or they are inefficient when query results contain large volumes of data to be transferred over the networks. Maintaining model-based views of data streams has been recently highlighted because it permits the data communication over networks to be efficient by transmitting parameter values for the models, instead of sending original data streams. This paper proposes a novel framework that employs the advantages of using model-based views for communication-efficient stream data processing over federated sensor networks, yet it significantly improves state-of-the-art approaches. The framework is generic and any time-parameterized models can be plugged, as well as accuracy guarantees for query results are ensured throughout the large-scale networks. In addition, we boost the performance of the framework by the coded model update that enables efficient model update from one node to another. It predetermines parameter values for the model, updates only identifiers of the parameter values, and compresses the identifiers by utilizing bitmaps. Moreover, we propose a novel correlation model, named coded inter-variable model, that integrates the efficiency of the coded model update into more precise predictions of correlated models. Empirical studies with real data demonstrate that our proposal achieves substantial amounts of communication reduction, outperforming a state-of-the art method.