Infoscience

Student project

Systematic Approach to Multi-layer Parallelisation of Time-based Stream Aggregation under Ingest Constraints in the Cloud

With its real-time capabilities, stream processing is popular for applications like anomaly detection for residential gateways and analytics for business intelligence. Just as other areas of computing, there has been an inevitable trend to shift stream processing to the cloud, thanks to virtualisation technologies and the ubiquity of theWeb. Recently launched Amazon Kinesis is amongst cloud-based streambuffer services that bridge the gap between off-cloud sources and cloud-based processing engines. Yet such services are prone to commercial or physical constraints on data ingest rate, calling for the parallelisation and chaining of processing nodes in amulti-layer topology. In this work, we studied the multi-layer parallelisation of time-based stream aggregation, a commonplace component in stream processing applications, under the impact of ingest rate constraints in the cloud. In particular, comprehensive analyses on rate transfer properties of processing nodes at various aggregation layers were conducted by considering the stream sources (e.g. residential gateways) and their information flow. This led to our proposal of systematic approaches to determining a parallelisation topology that avoids ingest rate saturation while minimising operational costs and deployment complexity. By applying these approaches, system over-provisioning or trial-and-error design can be eliminated. Our analyses were empirically verified through various simulations. Prototyping in the real Kinesis environment was also conducted to back up our analytical results and proposed topology determination approaches. It is noteworthy that, although the work has been motivated by and prototyped with Amazon Kinesis, it remains generic in nature and its applicability can extend beyond the specific scenario of Kinesis.

Related material