Hardware and Software Support for RPC-Centric Server Architecture

Sutherland, Mark Johnathon

doi:10.5075/epfl-thesis-8017

Sutherland, Mark Johnathon

2022

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Online services have become ubiquitous in technological society, the global demand for which has driven enterprises to construct gigantic datacenters that run their software. Such facilities have also recently become a substrate for third-party organizations due to the advantages of moving infrastructure to the cloud. The task of developing, releasing, and maintaining software at datacenter scale has given rise to a software architecture employing many independent microservices, each accomplishing a single role and communicating using an enforced API, the most common of which is Remote Procedure Calls (RPCs). As microservices have become standard practice for datacenter-scale software, the datacenter's underlying components must support them efficiently. The increasing adoption of microservice architectures implies a drastic growth in network communication, because each microservice receives and creates many RPCs that often execute for only a few microseconds (us). Therefore, delivering users an interactive, low latency service becomes more challenging, because each request involves more interactions with the components implementing the communication stack. It is particularly difficult to ensure the latency of the slowest responses, called the "tail latency", is acceptable to the service's users. Datacenter system design is therefore undergoing a rapid shift to enable programmers to reap the benefits of microservices without their performance quandaries. Handling RPCs from us-scale software at the line rates of today's NICs -- delivering up to 400Gbps -- is an open challenge, which will require designing all layers of the communication stack to natively offer support for RPC semantics. Although the performance of the network and protocol layers has drastically improved by prioritizing RPCs as a primary design objective, server hardware has not yet done so. Therefore, we posit that now is the time for an RPC centric server architecture to emerge to allow server endpoints to match the performance of their surrounding system components. To that end, this thesis introduces hardware and software support for RPC-centric server architecture. We first make the case that today's hardware-terminated network transport protocols grossly over-provision buffering because they are agnostic to the latency constraints inherent in each RPC, and simply exposing such RPC-level information to hardware allows 1.25-2.2x better performance. Motivated by prior work demonstrating the RPC stack's burdensome cost, we then show how a previously proposed RPC stack accelerator can be integrated with the implementation of our aforementioned NIC protocol. Finally, we propose new NIC driven load balancing policies that boost microservice throughput via improved locality, while simultaneously maintaining tail latency guarantees. Our proposals improve 99th% tail latency in data stores by 2-5.5x, and reduce instruction cache misses in stateless microservices by 1.1x-1.8x. In summary, we present evidence that designing and implementing a server's NIC hardware to natively support RPC semantics removes protocol scalability bottlenecks and enables microservices to enjoy further performance benefits.