Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Pixels: An Efficient Column Store for Cloud Data Lakes
 
conference paper

Pixels: An Efficient Column Store for Cloud Data Lakes

Bian, Haoqiong  
•
Ailamaki, Anastasia  
2022
2022 IEEE 38th International Conference on Data Engineering (ICDE)
38th International Conference on Data Engineering (ICDE)

To benefit from the cloud’s higher elasticity and price-efficiency, most modern data-lake engines support S3-like cloud object storage (COS) services as their optional or preferred underlying storage. Meanwhile, the widespread column stores, such as Parquet, are applied in these data lakes to improve analytical performance. However, as these column stores were designed for on-premise HDFS, they often suffer from the high latency of COS and deliver sub-optimal query performance. We observe that by optimizing the storage layout and data access pattern, we can effectively hide and mitigate the high latency. In this paper, we present Pixels, a column store optimized for the cloud that solves the problem by (1) the workload-driven storage layout optimization within and across the row group boundaries; (2) the I/O scheduling concerning the optimized storage layout and the performance characteristics of COS. They collectively improve the analytical performance in a transparent way that does not affect data ingestion and query execution in data lakes. Evaluations show that Pixels outperforms the state-of- the-art column store on COS by more than one order of magnitude on real-world workload and by 1.93x on TPC-H. Moreover, the performance of Pixels is also portable to HDFS.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Pixels_An_Efficient_Column_Store_for_Cloud_Data_Lakes.pdf

Type

Publisher

Version

Published version

Access type

restricted

License Condition

copyright

Size

791.92 KB

Format

Adobe PDF

Checksum (MD5)

12ff5b4348f7a9e1666a0f755f4dfd61

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés