Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Microsecond Consensus for Microsecond Applications
 
conference paper

Microsecond Consensus for Microsecond Applications

Aguilera, Marcos K.
•
Ben-David, Naama
•
Guerraoui, Rachid  
Show more
January 1, 2020
Proceedings Of The 14Th Usenix Symposium On Operating Systems Design And Implementation (Osdi '20)
14th USENIX Symposium on Operating Systems Design and Implementation (OSDI)

We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system-this cuts the replication and fail-over latencies of the prior systems by at least 61% and 90%. Mu implements bona fide state machine replication/consensus (SMR) with strong consistency for a generic app, but it really shines on microsecond apps, where even the smallest overhead is significant. To provide this performance, Mu introduces a new SMR protocol that carefully leverages RDMA. Roughly, in Mu a leader replicates a request by simply writing it directly to the log of other replicas using RDMA, without any additional communication. Doing so, however, introduces the challenge of handling concurrent leaders, changing leaders, garbage collecting the logs, and more-challenges that we address in this paper through a judicious combination of RDMA permissions and distributed algorithmic design. We implemented Mu and used it to replicate several systems: a financial exchange app called Liquibook, Redis, Memcached, and HERD [33]. Our evaluation shows that Mu incurs a small replication latency, in some cases being the only viable replication system that incurs an acceptable overhead.

  • Details
  • Metrics
Type
conference paper
Web of Science ID

WOS:000668979500034

Author(s)
Aguilera, Marcos K.
Ben-David, Naama
Guerraoui, Rachid  
Marathe, Virendra J.
Xygkis, Athanasios  
Zablotchi, Igor  
Date Issued

2020-01-01

Publisher

USENIX ASSOC

Publisher place

Berkeley

Published in
Proceedings Of The 14Th Usenix Symposium On Operating Systems Design And Implementation (Osdi '20)
ISBN of the book

978-1-939133-19-9

Start page

599

End page

616

Subjects

Computer Science, Software Engineering

•

Computer Science

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DCL  
Event nameEvent placeEvent date
14th USENIX Symposium on Operating Systems Design and Implementation (OSDI)

ELECTR NETWORK

Nov 04-06, 2020

Available on Infoscience
August 14, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/180599
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés