Scaling Byzantine Fault Tolerance

Online services are becoming more and more ubiquitous and keep growing in scale. At the same time, they are required to be highly available, secure, energy-efficient, and to achieve high performance. To ensure these (and many other) properties, replication and distribution of these services becomes inevitable. Indeed, today’s online services often involve thousands of processes running on different machines interconnected by a communication network. These processes may experience various kinds of failures, from simply crashing to being compromised by a malicious (Byzantine) adversary. The classic method for dealing with Byzantine faults is state machine replication (SMR). However, SMR fundamentally relies on a solution of the consensus problem, which often proves to be a scalability bottleneck. This dissertation addresses the scalability of Byzantine fault-tolerant systems. We argue that, for a certain class of applications, consensus either does not need to be solved at all, or only needs to be solved among a limited number of processes. By circumventing the consensus problem where solving it is not necessary, we improve the scalability of these applications. We start by focusing on the particular problem of distributed asset transfer, where digital assets are being transferred between user accounts—a problem underlying many cryptocurrency systems, most of which address it using Byzantine fault-tolerant SMR (and thus consensus). We show that consensus is not required for asset transfer by defining it as a sequential object type in the shared memory model and proving that it has consensus number 1 in Herlihy’s hierarchy. We further generalize the asset transfer object type, allowing an account to be shared by up to k owners. We prove that the consensus number of such an object type is k. We also discuss the asset transfer problem in the message passing model. We devise a consensusless asset transfer algorithm that relies on a secure broadcast primitive that, unlike consensus, has fully asynchronous deterministic implementations. Furthermore, since deterministic implementations of secure broadcast have limited scalability, we propose probabilistic secure broadcast, a variant of secure broadcast where some properties are allowed to be violated with a bounded probability. We design a highly scalable randomized algorithm that implements probabilistic secure broadcast with an arbitrarily low bound on the failure probability. Finally, we present Atum, a system for scalable group communication in a Byzantine environment that supports high churn. Atum achieves scalability by partitioning the system into groups of logarithmic size, only executing a consensus protocol inside each group.

Guerraoui, Rachid
Lausanne, EPFL

Note: The status of this file is: Anyone

 Record created 2019-09-05, last modified 2020-10-24

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)