Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Tail-tolerance as a Systems Principle not a Metric
 
conference paper

Tail-tolerance as a Systems Principle not a Metric

Kogias, Marios  
•
Bugnion, Edouard  
January 1, 2020
Proceedings Of 2020 4Th Asia-Pacific Workshop On Networking, Apnet 2020
4th Asia-Pacific Workshop on Networking (APNet)

Tail-latency tolerance (or just simply tail-tolerance) is the ability for a system to deliver a response with low-latency nearly all the time. It it typically expressed as a system metric (e.g., the 99th or 99.99th percentile latency) or as a service-level objective (e.g., the maximum throughput so that the tail latency is below a desired threshold). We advocate instead that modern datacenter systems should incorporate tail-tolerance as a core systems design principle and not a metric to be observed, and that tail-tolerant systems can be built out of large and complex applications whose individual components may suffer from latency deviations. This is analogous to fault-tolerance, where a fault-tolerant system can be built out of unreliable components. The general solution is for the system to control the applied load and keep it under the threshold that violates the latency SLO. We propose to augment RPC semantics with an architectural layer that measures the observed tail latency and probabilistically rejects RPC requests maintaining throughput under the threshold that violates the SLO. Our design is application-independent, and does not make any assumptions about the request service time distribution. We implemented a proof of concept for such a tail-tolerant layer using programmable switches, called SVEN. We demonstrate that the approach is suitable even for microsecond-scale RPCs with variable service times. Moreover, our approach does not induce measurable overheads, and can maintain the maximum achieved throughput very close to the load level that would violate the SLO without SVEN.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/3411029.3411032
Web of Science ID

WOS:001147813800003

Author(s)
Kogias, Marios  

École Polytechnique Fédérale de Lausanne

Bugnion, Edouard  

École Polytechnique Fédérale de Lausanne

Date Issued

2020-01-01

Publisher

Assoc Computing Machinery

Publisher place

NEW YORK

Published in
Proceedings Of 2020 4Th Asia-Pacific Workshop On Networking, Apnet 2020
ISBN of the book

978-1-4503-8876-4

Start page

16

End page

22

Subjects

Science & Technology

•

Technology

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DCSL  
Event nameEvent acronymEvent placeEvent date
4th Asia-Pacific Workshop on Networking (APNet)

ELECTR NETWORK

2020-08-03 - 2020-08-04

Available on Infoscience
January 31, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/246096
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés