Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores
 
conference paper

Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores

Margaritov, Artemiy
•
Gupta, Siddharth  
•
Gonzalez-Alberquilla, Rekai
Show more
January 1, 2019
2019 25Th Ieee International Symposium On High Performance Computer Architecture (Hpca)
25th IEEE International Symposium on High Performance Computer Architecture (HPCA)

In a drive to maximize resource utilization, today's datacenters are moving to colocation of latency-sensitive and batch workloads on the same server. State-of-the-art deployments, such as those at Google, colocate such diverse workloads even on a single SMT core. This form of aggressive colocation is afforded by virtue of the fact that a latency-sensitive service operating below its peak load has significant slack in its response latency with respect to the QoS target. The slack affords a degradation in single-thread performance, which is inevitable under SMT colocation, without compromising QoS targets. This work makes the observation that many batch applications can greatly benefit from a large instruction window to uncover ILP and MLP. Under SMT colocation, conventional wisdom holds that individual hardware threads should be limited in their ability to acquire and hold a disproportionately large share of microarchitectural resources so as not to compromise the performance of a co-running thread. We show that the performance slack inherent in latency-sensitive workloads operating at low to moderate load makes it safe to shift microarchitectural resources to a co-running batch thread without compromising QoS targets. Based on this insight, we introduce Stretch, a simple ROB partitioning scheme that is invoked by system software to provide one hardware thread with a much larger ROB partition at the expense of another thread. When Stretch is enabled for latency-sensitive workloads operating below their peak load on an SMT core, co-running batch applications gain 13% of performance on average (30% max) over a baseline SMT colocation and without compromising QoS constraints.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/HPCA.2019.00024
Web of Science ID

WOS:000469766300002

Author(s)
Margaritov, Artemiy
Gupta, Siddharth  
Gonzalez-Alberquilla, Rekai
Grot, Boris  
Date Issued

2019-01-01

Publisher

IEEE

Publisher place

New York

Published in
2019 25Th Ieee International Symposium On High Performance Computer Architecture (Hpca)
ISBN of the book

978-1-7281-1444-6

Series title/Series vol.

International Symposium on High-Performance Computer Architecture-Proceedings

Start page

15

End page

27

Subjects

Computer Science, Hardware & Architecture

•

Computer Science

•

quality of service

•

datacenter

•

simultaneous multi-threading

•

latency-sensitive applications

•

microarchitecture

•

resource-allocation

•

latency

•

parallelism

•

policy

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
PARSA  
Event nameEvent placeEvent date
25th IEEE International Symposium on High Performance Computer Architecture (HPCA)

Washington, DC

Feb 16-20, 2019

Available on Infoscience
June 18, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/157433
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés