Internet-scale storage systems under churn - A study of the steady state using Markov models
Content storage in a distributed collaborative environment uses redundancy for better resilience and thus provides good availability and durability. In a peer-to-peer environment, where peers continuously leave and rejoin the network, various lazy strategies can be employed to maintain a minimal redundancy of stored content in the system. Existing static resilience analyses fail to capture in detail the system's behavior over time, particularly the probability mass function of the actual available redundancy, since it ignores the crucial interplay between churn and maintenance operations, and looks only at the average system property. We perform a Markovian time-evolution analysis of the system specified by probability mass function of each possible system state, and establish that given a fixed rate of churn and a specific maintenance strategy, the system operates in a corresponding steady-state (dynamic equilibrium). Understanding the behavior of the system under such a dynamic equilibrium is a fundamental ingredient to precisely evaluate analytically the system's performance and availability as well as to determine the required operational maintenance cost. We also propose a new randomized variant of a lazy-maintenance scheme which has significant performance benefits in comparison to the existing deterministic procrastination based maintenance. We demonstrate the use of our analysis methodology in comparing performance of maintenance schemes using the examples of the new maintenance scheme we propose and the erstwhile best known existing lazy maintenance scheme. The comparative study shows that our randomized lazy maintenance strategy has substantially better resilience at same maintenance cost.