Cheap Data Analytics using Cold Storage Devices

Borovica-Gajic, Renata; Appuswamy, Raja; Ailamaki, Anastasia

doi:10.14778/2994509.2994521

Borovica-Gajic, Renata; Appuswamy, Raja; Ailamaki, Anastasia

2016

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Enterprise databases use storage tiering to lower capital and operational expenses. In such a setting, data waterfalls from an SSD-based high-performance tier when it is "hot" (frequently accessed) to a disk-based capacity tier and finally to a tape-based archival tier when "cold" (rarely accessed). To address the unprecedented growth in the amount of cold data, hardware vendors introduced new devices named Cold Storage Devices (CSD) explicitly targeted at cold data workloads. With access latencies in tens of seconds and cost/GB as low as $0.01/GB/month, CSD provide a middle ground between the low-latency (ms), high-cost, HDD-based capacity tier, and high-latency (min to h), low-cost, tape-based, archival tier. Driven by the price/performance aspect of CSD, this paper makes a case for using CSD as a replacement for both capacity and archival tiers of enterprise databases. Although CSD offer major cost savings, we show that current database systems can suffer from severe performance drop when CSD are used as a replacement for HDD due to the mismatch between design assumptions made by the query execution engine and actual storage characteristics of the CSD. We then build a CSD-driven query execution framework, called Skipper, that modifies both the database execution engine and CSD scheduling algorithms to be aware of each other. Using results from our implementation of the architecture based on PostgreSQL and OpenStack Swift, we show that Skipper is capable of completely masking the high latency overhead of CSD, thereby opening up CSD for wider adoption as a storage tier for cheap data analytics over cold data.

Details

Title Cheap Data Analytics using Cold Storage Devices

Author(s) Borovica-Gajic, Renata ; Appuswamy, Raja ; Ailamaki, Anastasia

Published in Proceedings of the VLDB Endowment

Pagination 12

Volume 9

Issue 12

Pages 1029-1040

Conference 42st International Conference on Very Large Data Bases, New Delhi, India, September 5-9, 2016

Date 2016

Publisher New York, Assoc Computing Machinery

DOI https://doi.org/10.14778/2994509.2994521

Other identifier(s) View record in Web of Science

Laboratories DIAS

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > DIAS - Data-Intensive Applications and Systems Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2016-08-03

Actions

Preview

Select file: