AFFINITY: Efficiently Querying Statistical Measures on Time-Series Data
Computing statistical measures for large databases of time series is a fundamental primitive for querying and mining time-series data [1–6]. This primitive is gaining importance with the increasing number and rapid growth of time series databases. In this paper we introduce a framework for efficient computation of statistical measures by exploit- ing the concept of affine relationships. Affine relationships can be used to infer statistical measures for time series from other related time series instead of directly computing them; thus, reducing the overall computation cost significantly. The resulting methods show at least one order of magnitude improvement over the best known methods. To the best of our knowledge, this is the first work that presents an unified approach for computing and querying several statistical measures on time-series data. Our approach includes three key components, which exploit affine relationships. First, the AFCLST algorithm that clusters the time-series data such that high-quality affine relationships could be easily found. Second, the SYMEX algorithm that uses the clustered time series and efficiently computes the desired affine relationships. Third, the SCAPE index structure that produces a many-fold improvement in the performance of processing several statistical queries by seamlessly indexing the affine relationships. Finally, we establish the effectiveness of our approaches by performing comprehensive experimental evaluation using real datasets.
Record created on 2012-07-19, modified on 2016-08-09