SigCO: Mining Significant Correlations via a Distributed Real-time Computation Engine

Guo, Tian; Calbimonte, Jean-Paul; Zhuang, Hao; Aberer, Karl

doi:10.1109/BigData.2015.7363819

conference paper not in proceedings

SigCO: Mining Significant Correlations via a Distributed Real-time Computation Engine

Guo, Tian

•

Calbimonte, Jean-Paul

•

Zhuang, Hao

more

2015 IEEE International Conference on Big Data

The dramatic rise of time-series data produced in a variety of contexts, such as stock markets, mobile sensing, sensor networks, data centre monitoring, etc., has fuelled the development of large-scale distributed real-time computation systems (e.g., Apache Storm, Samza, Spark Streaming, S4, etc.). However, it is still unclear how certain time series mining tasks could be performed using such new emerging systems. In this paper, we focus on the task of efficiently discovering statistically significant correlations among a large number of time series via a distributed realtime computation engine. We propose a framework referred to as SigCO. In SigCO, we put forward a novel partition-aware data shuffling, which is able to adaptively shuffle time series data only to the relevant nodes of the distributed real-time computation engine. On the other hand, in SigCO we design a δ-hypercube structure based correlation computation approach which is capable of pruning unnecessary correlation computations. Finally, our extensive experimental evaluations on real and synthetic datasets establish that SigCO outperforms the baseline approaches in terms of diverse performance metrics.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/117735

Name

07363819.pdf

Type

Postprint

Access type

openaccess

Size

1.06 MB

Format

Adobe PDF

Checksum (MD5)

68958d4aab707fd85fe9d65cc15be398