Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Fast Correlation Discovery for Large-Scale Streaming Time-Series Data
 
report

Fast Correlation Discovery for Large-Scale Streaming Time-Series Data

Guo, Tian  
•
Sathe, Saket  
•
Aberer, Karl  
2014

The dramatic rise of streaming time-series data produced in a vari- ety of contexts, such as stock markets, mobile sensing, sensor net- works, data centre monitoring, etc., has fuelled the development of large-scale distributed real-time computation systems ( e.g., Apache Storm, Spark Streaming, S4, etc.). However, it is still unclear how certain important tasks, which can be performed with relative ease in a centralized system, could be performed using such distributed systems. In this paper, we focus on one such task of continu- ously discovering correlations among a large number of stream- ing time series. While doing so, we address two key challenges: (1) the number of time-series pairs that have to be analyzed grows quadratically (O(n2)) in the number of time-series n, giving rise to a quadratic increase in the communication cost between differ- ent nodes of the distributed system, (2) as the size of the time series grows, the computational and communication costs again increase at a prohibitive rate. To tackle these challenges, we propose an approach referred to as AEGIS. AEGIS approximates a group of streams using affine trans- formations. Then it only communicates these stream groups, which are smaller in size and therefore significantly reduces the communi- cation overhead. Secondly, AEGIS dramatically enhances the com- putational efficiency by exploiting the properties of affine transfor- mations to prune the number of evaluated correlations. As for base- lines we adapt well-known centralized correlation computation ap- proaches to the distributed environment. Our extensive experimen- tal evaluations on real and synthetic datasets establish that AEGIS outperforms the baseline approaches in terms of communication cost, processing latency, and peak capacity.

  • Details
  • Metrics
Type
report
Author(s)
Guo, Tian  
Sathe, Saket  
Aberer, Karl  
Date Issued

2014

Written at

EPFL

EPFL units
LSIR  
Available on Infoscience
August 4, 2014
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/105408
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés