The complexity of reliable distributed storage

Levy, Ron

doi:10.5075/epfl-thesis-3999

doctoral thesis

The complexity of reliable distributed storage

2008

Distributed storage systems based on commodity hardware have gained in popularity because they are cheaper, can be made more reliable and offer better scalability than centralized storage systems. However, implementing and managing such systems is more complicated due to their distributed nature. This thesis presents efficient and reliable distributed storage that can be read and written by any number of clients. The focus is on atomic storage, which guarantees that from the clients' perspective, the distributed storage behaves exactly like a centralized one. Three key complexity metrics – time, number of logs and throughput – are considered. For each metric, precise performance bounds together with matching algorithms are provided. Experimental results are used to confirm the theoretical performance analysis wherever necessary. Time-complexity is an indication of the latency of read and write operations, i.e. the time between a client's invocation of an operation and the response of the storage. This thesis presents optimal fast atomic storage implementations, namely, implementations that complete both reads and writes in 1 round-trip between the client and the servers. Interestingly, the existence of fast implementations depends on the maximum number of clients that can read from the storage. More precisely, it is shown that a fast implementation is possible if and only if the number of readers is less than n/f – 2, with n servers out of which f can fail. Furthermore, it is shown that fast implementations are impossible for multiple writers if servers can fail. Log-complexity is an indication of the number of stable storage (hard disk) accesses needed in every read or write operation. Stable storage is used to log data in order to prevent data loss after a crash, in a context where servers can crash and recover. This thesis revises the notion of atomicity for this context, determines a lower bound on log-complexity and introduces an atomic storage matching this bound. The optimality of the storage is also established in terms of resilience, as well as time-complexity. Throughput measures the average number of client requests that can be completed per time unit. In order for a distributed storage to serve a high number of clients concurrently, high throughput is required. This thesis introduces an atomic storage that provides optimal read throughput for homogeneous clusters of servers. The storage organizes servers around a ring and uses only point-to-point communication. It is resilient to the crash failure of any number of readers and writers as well as to the crash failure of all but one server. The storage was evaluated on a cluster of 24 nodes. The same storage is modified to solve the more general uniform total order broadcast problem, which can be used to replicate any application reliably. Thus, the first uniform total order broadcast algorithm that is throughput optimal, regardless of message broadcast patterns, is introduced. The algorithm is based on a ring topology, only relies on point-to-point inter-server communication, and has a linear latency with respect to the number of processes. The implementation was benchmarked against two of the most widely used group communication packages and the results confirm that the algorithm is indeed throughput optimal.

Name

EPFL_TH3999.pdf

Access type

openaccess

Size

882.99 KB

Format

Adobe PDF

Checksum (MD5)

a17a573128fb7aea0bf7ff8592534c3f