Thread-Placement Learning
In a non-uniform memory access machine, the placement of software threads to hardware cores can have a significant effect on the performance of concurrent applications. Detecting the best possible placement for each application is a necessity for thread scheduling. Yet, due to the difficulty of this problem, operating-system schedulers do not really try to understand the needs of applications, but rather focus on (non-portable) scheduling heuristics.
In this paper, we introduce thread-placement learning (TPLE), a technique for understanding the placement requirements of applications. TPLE utilizes machine learning and performance counters for choosing between different placement policies. To feed the machine learning model, TPLE requires a set of portable microbenchmarks that produce training data i.e., performance counter measurements for all the target placement policies. We use this data to train a classifier that is able to choose between these policies online in order to change the thread-placement of a running application.
We demonstrate the practicality of TPLE by implementing a thread-placement algorithm, named Slate. Slate is able to automatically and online (i.e., in runtime) select between the two most commonly-used placement policies, namely locality and round-robin placement on the nodes of a multicore. To the best of our knowledge, Slate is the first online thread-placement algorithm that utilizes machine learning in combination with performance counters. We evaluate Slate and show that it achieves up to 93% accuracy in its decisions and outperforms the Linux scheduler by up to 16%.
WOS:000667971400080
2020-01-01
978-1-7281-7002-2
Los Alamitos
IEEE International Conference on Distributed Computing Systems
877
887
REVIEWED
Event name | Event place | Event date |
ELECTR NETWORK | Nov 29-Dec 01, 2020 | |