Scaling a Reliable Distributed System
We consider the problem of reliably connecting an arbitrarily large set of computers (nodes) with communication channels. Reliability means here the ability, for any two nodes, to remain connected (i.e., their ability to communicate) with probability at least mu, despite the very fact that every other node or channel has an independent probability lambda of failing. A simple solution to the problem consists in connecting every pair of nodes with several channels. This solution however does not scale: the number of connections per node (degree) would not be bounded by a constant. We address the following question: is it possible to reliably connect an arbitrarily large number n of nodes with a bounded degree? This problem is non-trivial, as the level of redundancy implied by reliability is apparently incompatible with a bounded degree. In this paper, we show that, may be surprisingly, the answer to this problem is positive. We show how to build a graph to reliably connect n nodes while preserving a bounded degree. We first address a weak version of the problem, where we allow to add intermediary nodes (that are not necessarily reliably connected to the others), provided that their number is linear in the total number of nodes reliably connected. To solve the weaker problem, we define a fractal graph that ensures constant reliability at any distance, and combine it with a tree-like graph to reliably connect an arbitrary set of nodes. Then, to solve the strong version of the problem (without intermediary nodes), we split the n nodes to connect into several subsets, and reliably connect each pair of subsets with an instance of the previous graph containing at most n nodes. The final graph is obtained by merging all these instances together. The linearity property of the weak problem ensures that the number of graphs we merge is bounded by a constant, which guarantees a bounded degree. Interestingly, the resulting graph has an optimal diameter: it is logarithmic in n. Whilst we focus on crash-stop failures for presentation simplicity, we also show how our solution can be generalized to tolerate Byzantine (malicious) failures, by increasing the level of redundancy and performing majority votes at several levels of the graph.
Record created on 2016-05-12, modified on 2016-08-09