Reliable Microsecond-Scale Distributed Computing

Guerraoui, RachidXygkis, Athanasios2023-10-032023-10-032023-10-03202310.5075/epfl-thesis-9909https://infoscience.epfl.ch/handle/20.500.14299/201354The landscape of computing is changing, thanks to the advent of modern networking equipment that allows machines to exchange information in as little as one microsecond. Such advancement has enabled microsecond-scale distributed computing, where entire distributed services (e.g., trading systems, key-value stores) execute within a few microseconds. As a result, these services manage to operate in a timescale that was not possible before, thus becoming the foundation for several other dependent services. The high-level problem we study in this thesis is that of making microsecond-scale computing reliable, while simultaneously minimizing the latency overhead of achieving reliability. Reliability is essential, as it allows services to provide non-stop operation to their users in spite of failures. The recipe to achieving reliability is the well known technique of replication, yet replication can undesirably incur significant latency to microsecond-scale applications. We approach this problem from two fronts, as follows. First, we observe a weakness in the way of achieving reliability in existing microsecond-scale applications. Such applications rely on replication but they only look at making replication efficient in the absence of failures. However, when failures occur, their latency cost increases by at least two orders of magnitude. This behavior results in latency spikes, making microsecond applications operate at milliseconds during periods of active failures. We set out to create uKharon, a membership service for the microsecond scale that aims at addressing these issues by exposing a simple and reusable interface that forms the basis for all microsecond-scale applications requiring high reliability. We additionally showcase how uKharon can be used to replicate a microsecond-scale application with minimal latency overhead. Second, we take a step back and look at how to provide reliability under different failure models. uKharon assumes the standard failure model of crashes, yet in certain occasions failures can go beyond crashes. As such, we also look at Byzantine (i.e., arbitrary) failures, in an attempt to safeguard applications from spurious failures, such as data corruption and malicious behavior. However, dealing with Byzantine faults has always been associated with high cost, either due to the number of machines required for replication, or due to the necessary - yet computationally expensive - cryptographic signatures. We begin by studying the theoretical limitations of achieving frugality, i.e., using few replicas and minimizing signatures, in shared-memory distributed computing. With this knowledge, we then create uBFT, a Byzantine-resilient replication engine that relies on disaggregated memory, a technology enabled by modern networking adapters that realizes shared memory in practice. Due to its frugality, uBFT becomes the first system to offer Byzantine-resilient microsecond-scale replication.enByzantine-fault tolerancecrash-fault toleranceconsensusconsistent broadcastfailure detectionmembership serviceRemote Direct Memory Access(RDMA)non-equivocationreplicationstate machine replicationReliable Microsecond-Scale Distributed Computingthesis::doctoral thesis