Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores

Margaritov, Artemiy; Gupta, Siddharth; Gonzalez-Alberquilla, Rekai; Grot, Boris

doi:10.1109/HPCA.2019.00024

conference paper

Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores

Margaritov, Artemiy

•

Gupta, Siddharth

•

Gonzalez-Alberquilla, Rekai

January 1, 2019

2019 25Th Ieee International Symposium On High Performance Computer Architecture (Hpca)

25th IEEE International Symposium on High Performance Computer Architecture (HPCA)

In a drive to maximize resource utilization, today's datacenters are moving to colocation of latency-sensitive and batch workloads on the same server. State-of-the-art deployments, such as those at Google, colocate such diverse workloads even on a single SMT core. This form of aggressive colocation is afforded by virtue of the fact that a latency-sensitive service operating below its peak load has significant slack in its response latency with respect to the QoS target. The slack affords a degradation in single-thread performance, which is inevitable under SMT colocation, without compromising QoS targets. This work makes the observation that many batch applications can greatly benefit from a large instruction window to uncover ILP and MLP. Under SMT colocation, conventional wisdom holds that individual hardware threads should be limited in their ability to acquire and hold a disproportionately large share of microarchitectural resources so as not to compromise the performance of a co-running thread. We show that the performance slack inherent in latency-sensitive workloads operating at low to moderate load makes it safe to shift microarchitectural resources to a co-running batch thread without compromising QoS targets. Based on this insight, we introduce Stretch, a simple ROB partitioning scheme that is invoked by system software to provide one hardware thread with a much larger ROB partition at the expense of another thread. When Stretch is enabled for latency-sensitive workloads operating below their peak load on an SMT core, co-running batch applications gain 13% of performance on average (30% max) over a baseline SMT colocation and without compromising QoS constraints.

Type

conference paper

DOI

10.1109/HPCA.2019.00024

Web of Science ID

WOS:000469766300002

Author(s)

Margaritov, Artemiy

Gupta, Siddharth

Gonzalez-Alberquilla, Rekai

Grot, Boris

Date Issued

2019-01-01

Publisher

IEEE

Publisher place

New York

Published in

2019 25Th Ieee International Symposium On High Performance Computer Architecture (Hpca)

ISBN of the book

978-1-7281-1444-6

Series title/Series vol.

International Symposium on High-Performance Computer Architecture-Proceedings

Start page

15

End page

27

Subjects

Computer Science, Hardware & Architecture

•

Computer Science

•

quality of service

•

datacenter

•

simultaneous multi-threading

•

latency-sensitive applications

•

microarchitecture

•

resource-allocation

•

latency

•

parallelism

•

policy

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

PARSA

Event name	Event place	Event date
25th IEEE International Symposium on High Performance Computer Architecture (HPCA)	Washington, DC	Feb 16-20, 2019

Available on Infoscience

June 18, 2019

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/157433