Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Tail-tolerance as a Systems Principle not a Metric
 
Loading...
Thumbnail Image
conference paper

Tail-tolerance as a Systems Principle not a Metric

Kogias, Marios  
•
Bugnion, Edouard  
January 1, 2020
Proceedings Of 2020 4Th Asia-Pacific Workshop On Networking, Apnet 2020
4th Asia-Pacific Workshop on Networking (APNet)

Tail-latency tolerance (or just simply tail-tolerance) is the ability for a system to deliver a response with low-latency nearly all the time. It it typically expressed as a system metric (e.g., the 99th or 99.99th percentile latency) or as a service-level objective (e.g., the maximum throughput so that the tail latency is below a desired threshold). We advocate instead that modern datacenter systems should incorporate tail-tolerance as a core systems design principle and not a metric to be observed, and that tail-tolerant systems can be built out of large and complex applications whose individual components may suffer from latency deviations. This is analogous to fault-tolerance, where a fault-tolerant system can be built out of unreliable components. The general solution is for the system to control the applied load and keep it under the threshold that violates the latency SLO. We propose to augment RPC semantics with an architectural layer that measures the observed tail latency and probabilistically rejects RPC requests maintaining throughput under the threshold that violates the SLO. Our design is application-independent, and does not make any assumptions about the request service time distribution. We implemented a proof of concept for such a tail-tolerant layer using programmable switches, called SVEN. We demonstrate that the approach is suitable even for microsecond-scale RPCs with variable service times. Moreover, our approach does not induce measurable overheads, and can maintain the maximum achieved throughput very close to the load level that would violate the SLO without SVEN.

  • Details
  • Metrics
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés