Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Recovery Algorithms for Paxos-Based State Machine Replication
 
research article

Recovery Algorithms for Paxos-Based State Machine Replication

Konczak, Jan
•
Wojciechowski, Pawel T.
•
Santos, Nuno  
Show more
March 1, 2021
Ieee Transactions On Dependable And Secure Computing

In this article, we propose and evaluate three different state recovery algorithms aimed for Paxos-one of the most popular distributed agreement protocols. Paxos is commonly used to maintain consistency among state machine replicas despite of failures of processes. The first algorithm, that we call FullSS, originates from the original Paxos and requires that the system frequently uses stable storage during regular (non-faulty) execution. The other two state recovery algorithms, ViewSS and EpochSS, scarcely require access to stable storage, and the recovering process must do much less work to restore its lost state, and to catch up on the current state of the system. We thoroughly analyze and compare the behavior of the three algorithms during state recovery and also during regular, non-faulty system execution, under various workloads (e.g., causing the network or CPU saturation). The experimental results show that by using ViewSS and EpochSS, we can significantly improve process recovery with respect to the original Paxos, if only it can be assumed that at any time a majority of replicas are up running (excluding those replicas that are just recovering). Moreover, these algorithms do not impact the performance of Paxos during regular (non-faulty) operation. However, FullSS is the only choice out of the three, if the system must tolerate catastrophic failures.

  • Details
  • Metrics
Type
research article
DOI
10.1109/TDSC.2019.2926723
Web of Science ID

WOS:000628912100009

Author(s)
Konczak, Jan
Wojciechowski, Pawel T.
Santos, Nuno  
Zurkowski, Tomasz
Schiper, Andre  
Date Issued

2021-03-01

Publisher

IEEE COMPUTER SOC

Published in
Ieee Transactions On Dependable And Secure Computing
Volume

18

Issue

2

Start page

623

End page

640

Subjects

Computer Science, Hardware & Architecture

•

Computer Science, Information Systems

•

Computer Science, Software Engineering

•

Computer Science

•

computer crashes

•

protocols

•

fault tolerance

•

fault tolerant systems

•

system performance

•

writing

•

distributed algorithms

•

paxos

•

state machine replication

•

fault-tolerance

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSR-IC  
Available on Infoscience
May 22, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/178220
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés