Online indexing and distributed querying model-view sensor data in the cloud
As various kinds of sensors penetrate our daily life (e.g., sensor networks for environmental monitoring, GPS for localization and navigation), the efficient management of massive amount of sensor data becomes increasingly important at present. Many sensor data management systems are implemented based on key-value stores in the cloud; the traditional solutions based on relational database lack scalability to accommodate the large-scale sensor data efficiently. Meanwhile, model-view sensor data management, which stores the sensor data in the form of modelled segments, largely reduces the amount of raw data. However, currently there is no index and query optimizations on these modelled segments in the cloud, which results in full table scan for query processing in the worst case. In this paper, we propose an innovative model index for sensor data segments in key-value stores (KVM-index). KVM-index consists of two interval indices on the time and sensor value dimensions respectively, each of which has an in-memory search tree and a secondary list materialized in the key-value store. This in-memory and key-value composite structure enables to update new incoming sensor data segments with constant network I/O. Second, for time (or value)-range and point queries a MapReduce-based approach is designed to process the discrete predicate-related ranges of the table of KVM-index, thereby eliminating computation and communication overheads incurred by accessing irrelevant parts of the index table in conventional MapReduce programs. Finally, we propose a cost based adaptive strategy for the KVM-index-MapReduce framework to process composite queries on both time and value dimensions. As proved by extensive experiments in a private cloud, our approach outperforms in query response time both MapReduce-based processing of the raw sensor data and multiple alternative approaches of querying model-view sensor data.