Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. RUBIK: Efficient Threshold Queries on Massive Time Series
 
conference paper

RUBIK: Efficient Threshold Queries on Massive Time Series

Tzirita Zacharatou, Eleni  
•
Heinis, Thomas  
•
Tauheed, Farhan  
Show more
2015
Proceedings of the 27th International Conference on Scientific and Statistical Database Management
27th International Conference on Scientific and Statistical Database Management

An increasing number of applications from finance, meteorology, science and others are producing time series as output. The analysis of the vast amount of time series is key to understand the phenomena studied, particularly in the simulation sciences, where the analysis of time series resulting from simulation allows scientists to refine the model simulated. Existing approaches to query time series typically keep a compact representation in main memory, use it to answer queries approximately and then access the exact time series data on disk to validate the result. The more precise the in-memory representation, the fewer disk accesses are needed to validate the result. With the massive sizes of today's datasets, however, current in-memory representations oftentimes no longer fit into main memory. To make them fit, their precision has to be reduced considerably resulting in substantial disk access which impedes query execution today and limits scalability for even bigger datasets in the future. In this paper we develop RUBIK, a novel approach to compressing and indexing time series. RUBIK exploits that time series in many applications and particularly in the simulation sciences are similar to each other. It compresses similar time series, i.e., observation values as well as time information, achieving better space efficiency and improved precision. RUBIK translates threshold queries into two dimensional spatial queries and efficiently executes them on the compressed time series by exploiting the pruning power of a tree structure to find the result, thereby outperforming the state-of-the-art by a factor of between 6 and 23. As our experiments further indicate, exploiting similarity within and between time series is crucial to make query execution scale and to ultimately decouple query execution time from the growth of the data (size and number of time series).

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Eleni_Tzirita_Zacharatou_RUBIK.pptx

Access type

openaccess

Size

996.89 KB

Format

Microsoft Powerpoint XML

Checksum (MD5)

a4155970d68431ed463ce34236ea8884

Loading...
Thumbnail Image
Name

a17-zacharatou.pdf

Access type

openaccess

Size

1.43 MB

Format

Adobe PDF

Checksum (MD5)

22c2202a9617e23ef755dac2b77a9d9f

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés