Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores

Margaritov, Artemiy; Gupta, Siddharth; Gonzalez-Alberquilla, Rekai; Grot, Boris

doi:10.1109/HPCA.2019.00024

Margaritov, Artemiy; Gupta, Siddharth; Gonzalez-Alberquilla, Rekai; Grot, Boris

2019

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

In a drive to maximize resource utilization, today's datacenters are moving to colocation of latency-sensitive and batch workloads on the same server. State-of-the-art deployments, such as those at Google, colocate such diverse workloads even on a single SMT core. This form of aggressive colocation is afforded by virtue of the fact that a latency-sensitive service operating below its peak load has significant slack in its response latency with respect to the QoS target. The slack affords a degradation in single-thread performance, which is inevitable under SMT colocation, without compromising QoS targets. This work makes the observation that many batch applications can greatly benefit from a large instruction window to uncover ILP and MLP. Under SMT colocation, conventional wisdom holds that individual hardware threads should be limited in their ability to acquire and hold a disproportionately large share of microarchitectural resources so as not to compromise the performance of a co-running thread. We show that the performance slack inherent in latency-sensitive workloads operating at low to moderate load makes it safe to shift microarchitectural resources to a co-running batch thread without compromising QoS targets. Based on this insight, we introduce Stretch, a simple ROB partitioning scheme that is invoked by system software to provide one hardware thread with a much larger ROB partition at the expense of another thread. When Stretch is enabled for latency-sensitive workloads operating below their peak load on an SMT core, co-running batch applications gain 13% of performance on average (30% max) over a baseline SMT colocation and without compromising QoS constraints.

Details

Title Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores

Author(s) Margaritov, Artemiy ; Gupta, Siddharth ; Gonzalez-Alberquilla, Rekai ; Grot, Boris

Published in 2019 25Th Ieee International Symposium On High Performance Computer Architecture (Hpca)

Series International Symposium on High-Performance Computer Architecture-Proceedings

Pages 15-27

Conference 25th IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, DC, Feb 16-20, 2019

Date 2019-01-01

Publisher New York, IEEE

ISSN 1530-0897

ISBN 978-1-7281-1444-6

Keywords

quality of service; datacenter; simultaneous multi-threading; latency-sensitive applications; microarchitecture; resource-allocation; latency; parallelism; policy

DOI https://doi.org/10.1109/HPCA.2019.00024

Other identifier(s) View record in Web of Science

Laboratories PARSA

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > PARSA - Parallel Systems Architecture Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2019-06-18