Fast Distributed Correlation Discovery Over Streaming Time-Series Data

The dramatic rise of time-series data in a variety of contexts, such as social networks, mobile sensing, data centre monitoring, etc., has fuelled interest in obtaining real-time insights from such data using distributed stream processing systems. One such extremely valuable insight is the discovery of correlations in real-time from large-scale time-series data. A key challenge in discovering correlations is that the number of time-series pairs that have to be analyzed grows quadratically in the number of time-series, giving rise to a quadratic increase in both computation cost and communication cost between the cluster nodes in a distributed environment. To tackle the challenge, we propose a framework called AEGIS. AEGIS exploits well-established statistical properties to dramatically prune the number of time-series pairs that have to be evaluated for detecting interesting correlations. Our extensive experimental evaluations on real and synthetic datasets establish the efficacy of AEGIS over baselines.

Presented at:
ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia, Oct 19-23, 2015

 Record created 2015-08-03, last modified 2018-03-17

Rate this document:

Rate this document:
(Not yet reviewed)