Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Dalton: Learned Partitioning for Distributed Data Streams
 
Loading...
Thumbnail Image
conference paper

Dalton: Learned Partitioning for Distributed Data Streams

Zapridou, Eleni  
•
Mytilinis, Ioannis  
•
Ailamaki, Anastasia  
2022
Proceedings of the VLDB Endowment
International Conference on Very Large Databases (VLDB 2022)

To sustain the input rate of high-throughput streams, modern stream processing systems rely on parallel execution. However, skewed data yield imbalanced load assignments and create stragglers that hinder scalability. Deciding on a static partitioning for a given set of “hot” keys is not sufficient as these keys are not known in advance, and even worse, the data distribution can change unpredictably. Existing algorithms either optimize for a specific distribution or, in order to adapt, assume a centralized partitioner that processes every incoming tuple and observes the whole workload. However, this is not realistic in a distributed environment, where multiple parallel upstream operators exist, as the centralized partitioner itself becomes the bottleneck and limits scalability. In this work, we propose Dalton: a lightweight, adaptive, yet scalable partitioning operator that relies on reinforcement learning. By memoizing state and dynamically keeping track of recent experience, Dalton: i) adjusts its policy at runtime and quickly adapts to the workload, ii) avoids redundant computations and minimizes the per-tuple partitioning overhead, and iii) efficiently scales out to multiple instances that learn cooperatively and converge to a joint policy. Our experiments indicate that Dalton scales regardless of the input data distribution and sustains 1.3× - 6.7× higher throughput than existing approaches.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

p491-zapridou.pdf

Type

Publisher

Access type

openaccess

License Condition

CC BY-NC-ND

Size

2.6 MB

Format

Adobe PDF

Checksum (MD5)

142f27277039b4a09c675bf80766602e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés