Big-Data Streaming Applications Scheduling Based on Staged Multi-Armed Bandits

Kanoun, Karim; Tekin, Cem; Atienza, David; Van Der Schaar, Mihaela

doi:10.1109/Tc.2016.2550454

research article

Big-Data Streaming Applications Scheduling Based on Staged Multi-Armed Bandits

Kanoun, Karim

•

Tekin, Cem

•

Atienza, David

2016

IEEE Transactions on Computers

Several techniques have been recently proposed to adapt Big-Data streaming applications to existing many core platforms. Among these techniques, online reinforcement learning methods have been proposed that learn how to adapt at run-time the throughput and resources allocated to the various streaming tasks depending on dynamically changing data stream characteristics and the desired applications performance (e.g., accuracy). However, most of state-of-the-art techniques consider only one single stream input in its application model input and assume that the system knows the amount of resources to allocate to each task to achieve a desired performance. To address these limitations, in this paper we propose a new systematic and efficient methodology and associated algorithms for online learning and energy-efficient scheduling of Big-Data streaming applications with multiple streams on many core systems with resource constraints. We formalize the problem of multi-stream scheduling as a staged decision problem in which the performance obtained for various resource allocations is unknown. The proposed scheduling methodology uses a novel class of online adaptive learning techniques which we refer to as staged multi-armed bandits (S-MAB). Our scheduler is able to learn online which processing method to assign to each stream and how to allocate its resources over time in order to maximize the performance on the fly, at run-time, without having access to any offline information. The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that has full knowledge of the input stream where the differences in throughput, observed quality, resource usage and energy efficiency are less than 1, 0.3, 0.2 and 4 percent respectively.

Type

research article

DOI

10.1109/Tc.2016.2550454

Web of Science ID

WOS:000388498600007

Author(s)

Kanoun, Karim

Tekin, Cem

Atienza, David

Van Der Schaar, Mihaela

Date Issued

2016

Publisher

Institute of Electrical and Electronics Engineers

Published in

IEEE Transactions on Computers

Volume

65

Issue

12

Start page

3591

End page

3605

Subjects

Scheduling

•

machine learning

•

many-core platforms

•

data mining

•

big-data

•

multiple streams processing

•

concept drift

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

ESL

Available on Infoscience

January 24, 2017

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/133591