Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Fast Correlation Discovery for Large-Scale Streaming Time-Series Data
 
report

Fast Correlation Discovery for Large-Scale Streaming Time-Series Data

Guo, Tian  
•
Sathe, Saket  
•
Aberer, Karl  
2014

The dramatic rise of streaming time-series data produced in a vari- ety of contexts, such as stock markets, mobile sensing, sensor net- works, data centre monitoring, etc., has fuelled the development of large-scale distributed real-time computation systems ( e.g., Apache Storm, Spark Streaming, S4, etc.). However, it is still unclear how certain important tasks, which can be performed with relative ease in a centralized system, could be performed using such distributed systems. In this paper, we focus on one such task of continu- ously discovering correlations among a large number of stream- ing time series. While doing so, we address two key challenges: (1) the number of time-series pairs that have to be analyzed grows quadratically (O(n2)) in the number of time-series n, giving rise to a quadratic increase in the communication cost between differ- ent nodes of the distributed system, (2) as the size of the time series grows, the computational and communication costs again increase at a prohibitive rate. To tackle these challenges, we propose an approach referred to as AEGIS. AEGIS approximates a group of streams using affine trans- formations. Then it only communicates these stream groups, which are smaller in size and therefore significantly reduces the communi- cation overhead. Secondly, AEGIS dramatically enhances the com- putational efficiency by exploiting the properties of affine transfor- mations to prune the number of evaluated correlations. As for base- lines we adapt well-known centralized correlation computation ap- proaches to the distributed environment. Our extensive experimen- tal evaluations on real and synthetic datasets establish that AEGIS outperforms the baseline approaches in terms of communication cost, processing latency, and peak capacity.

  • Details
  • Metrics
Type
report
Author(s)
Guo, Tian  
•
Sathe, Saket  
•
Aberer, Karl  
Date Issued

2014

Written at

EPFL

EPFL units
LSIR  
Available on Infoscience
August 4, 2014
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/105408
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés