GSN Cloud Storage and Processing
GSN (Global Sensor Networks) is capable of managing configurable virtual sensors through a wide range of wrappers, and is able to manage one-shot and continuous queries, even in a distributed environment with several GSN instances. However, each GSN instance runs on a single machine, and uses a relational-based data storage underneath. While in most medium-size sensor deployments this is just enough, when it comes to process very large numbers of sensor observations, and at very high incoming rates, scalability can become a problem at various stages. The project aims at integrating Spark Streaming, an extension of the core Spark API, with GSN to boost query processing of streams in a multi-node environment and achieve better scalability. We show the feasibility of our approach and demonstrate its scalability through two applications: linear segmentation and anomaly detection: discovering trend of weather data and identifying occasions when live temperature data is delivering unreasonable values.