Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. HovercRaft: Achieving Scalability and Fault-tolerance for microsecond-scale Datacenter Services
 
conference paper not in proceedings

HovercRaft: Achieving Scalability and Fault-tolerance for microsecond-scale Datacenter Services

Kogias, Marios  
•
Bugnion, Edouard  
2020
EuroSys 2020

Cloud platform services must simultaneously be scalable, meet low tail latency service-level objectives, and be resilient to a combination of software, hardware, and network failures. Replication plays a fundamental role in meeting both the scalability and the fault-tolerance requirement, but is subject to opposing requirements: (1) scalability is typically achieved by relaxing consistency; (2) fault-tolerance is typically achieved through the consistent replication of state machines. Adding nodes to a system can therefore either in- crease performance at the expense of consistency, or increase resiliency at the expense of performance. We propose HovercRaft, a new approach by which adding nodes increases both the resilience and the performance of general-purpose state-machine replication. We achieve this through an extension of the Raft protocol that carefully eliminates CPU and I/O bottlenecks and load balances requests. Our implementation uses state-of-the-art kernel-bypass techniques, datacenter transport protocols, and in-network programmability to deliver up to 1 million operations/second for clusters of up to 9 nodes, linear speedup over unreplicated configuration for selected workloads, and a 4× speedup for the YCSBE-E benchmark running on Redis over an unreplicated deployment.

  • Files
  • Details
  • Metrics
Type
conference paper not in proceedings
DOI
10.1145/3342195.3387545
Author(s)
Kogias, Marios  
Bugnion, Edouard  
Date Issued

2020

Subjects

datacenter systems

•

remote procedure call

•

microsecond

•

consensus

•

raft

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DCSL  
Event nameEvent placeEvent date
EuroSys 2020

Heraklion, Crete, Greece

Avril 27-30, 2020

Available on Infoscience
March 29, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/167715
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés