Operating System and Network Co-Design for Latency-Critical Datacenter Applications
Datacenters are the heart of our digital lives. Online applications, such as social-networking and e-commerce, run inside datacenters under strict Service Level Objectives for their tail latency. Tight latency SLOs are necessary for such services to remain interactive and keep users engaged. At the same time, datacenters operate under a single administrative domain which enables the deployment of customized network and hardware solutions based on specific application requirements. Customization enables the design of datacenter-tailored and SLO-aware mechanisms that are more efficient and have better performance.
In this thesis we focus on three main datacenter challenges. First, latency-critical, in-memory datacenter applications have ÎŒs-scale services times and run on top of hardware infrastructure which is also capable of ÎŒs-scale inter-node round-trip times. Existing operating systems, though, were designed under completely different assumptions and are not ready for ÎŒs-scale computing. Second, the base of datacenter communications is Remote Procedure Calls (RPCs) that depend on a message-oriented paradigm, while TCP still remains widely-used for intra datacenter communications. The mismatch between TCPâ s bytestream-oriented abstraction and RPCs causes several inefficiencies and deteriorates tail latency. Finally, datacenter applications follow a scale-out paradigm based on large fan-out communication schemes. In such a scenario, tail latency becomes a critical metric due to the tail at scale problem. The two main factors that affect tail latency is interference and scheduling/load balancing decisions.
To deal with the above challenges we advocate for a co-design of network and operating system mechanisms targeting ÃŽÅ’s-scale tail optimisations for latency-critical datacenter applications. Our approach investigates the potential of pushing functionality to the network leveraging emerging in-network programmability features. Whenever existing abstractions fail to meet the ÃŽÅ’s-scale requirements or restrict our design space, we propose new ones given the design and deployment freedom the datacenter offers.
This thesis contributions can be split in three main parts. We, first, design and build tools and methodologies for ÃŽÅ’s-scale latency measurements and system evaluation. Our approach depends on a robust theoretical background in statistics and queueing theory. We, then, revisit existing operating system and networking mechanisms for TCP-based datacenter applications. We design an OS scheduler for ÃŽÅ’s-scale tasks, while we modify TCP to improve L4 load balancing, and provide an SLO-aware flow control mechanism. Finally, after identifying the problems related to TCP-based RPC services, we introduce a new transport protocol for datacenter RPCs and in-network policy enforcement that enables us to push functionality to the network. We showcase how the new protocol improves the performance and simplifies the implementation of in-network RPC load balancing, SLO-aware RPC flow control, and application-agnostic fault-tolerant RPCs.
EPFL_TH7108.pdf
openaccess
2.43 MB
Adobe PDF
a25fd9140a26f2768858602576cb8f8a