Understanding and Mitigating Latency Variability of Latency-Critical Applications
A major theme of IT in the past decade has been the shift from on-premise hardware to cloud computing. Running a service in the public cloud is practical, because a large number of resources can be bought on-demand, but this shift comes with its own set of challenges, e.g., customers having less control over their environment. In scale-out deployments of on-line data-intensive services, each client request typically executes queries on many servers in parallel, and its final response time is often lower-bounded by the slowest query execution. There are numerous hard-to-trace reasons why some queries finish later than others. Latency variability appears at different timescales. To understand its cause and impact we need to measure it correctly. It is clear that we need hardware support to measure latencies on the nanosecond-scale, and that software-based tools are good-enough for the millisecond- scale, but it is not clear whether software-based tools are also reliable at the microsecond-scale, which is of growing interest to a large research and industry community.
Measuring is a first step towards understanding the problems that cause latency variability, but what if some of them are extremely complex, or fixing them is out of reach given a service providerâ s restrictions? Even large companies that own datacenters and build their own software and hardware stack sometimes suffer from hard-to-understand or hard-to-fix performance issues. Medium-sized companies that simply use shared public infrastructure, and do not themselves develop most of the code running on the machines, have limited capabilities and often cannot rule out each and every source of system interference. This reality inspired the line of research on what is known as hedging policies. By using redundant requests we can reduce overall latency at the cost of consuming additional resources.
This dissertation characterizes latency variability of interactive services, shows how to measure it and how to mitigate its effect on end-to-end latency without sacrificing system capacity. Concretely, this dissertation makes three contributions: First, it shows how to measure microsecond-scale latency variability with both low cost and high precision, with the help of kernel bypass and hardware timestamps. Second, it empirically derives a lower bound on the tail latency that any implementable hedging policy might achieve for a given workload. Through evaluating our lower bound on a large parameter space, we determine when hedging is beneficial. Lastly, we describe and evaluate a practical policy, LAEDGE, that approximates our theoretical lower bound and achieves as much as half of its hedging potential. We show the applicability of our solution to real applications deployed in the public cloud where LAEDGE outperforms the state-of-the-art scheduling policies by up to 49%, averaged on low to medium load.
EPFL_TH10253.pdf
openaccess
2.38 MB
Adobe PDF
6068060f43d8e069f182e9a33630d830