Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Assessing the Crash-Failure Assumption of Group Communication Protocols
 
conference paper

Assessing the Crash-Failure Assumption of Group Communication Protocols

Mena, Sergio  
•
Basile, Claudio
•
Kalbarczyk, Zbigniew
Show more
2005
Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering
16th IEEE International Symposium on Software Reliability Engineering

Designing and correctly implementing Group Communication Systems (GCSs) is notoriously difficult. Assuming that processes fail only by crashing provides a powerful means to simplify the theoretical development of these systems. When making this assumption, however, one should not forget that clean crash failures provide only a coarse approximation of the effects that errors can have in distributed systems. Ignoring such a discrepancy can lead to complex GCS-based applications that pay a large price in terms of performance overhead yet fail to deliver the promised level of dependability. This paper provides a thorough study of error effects in real systems by demonstrating a \emph{error-injection-driven design methodology}, where error injection is integrated in the core steps of the design process of a robust fault-tolerant system. The methodology is demonstrated for the \emph{Fortika} toolkit, a Java-based GCS. Error injection enables us to uncover subtle reliability bottlenecks both in the design of Fortika and in the implementation of Java. Based on the obtained insights, we enhance Fortika's design to reduce the identified bottlenecks. Finally, a comparison of the results obtained for Fortika with the results obtained for the OCAML-based Ensemble system in a previous work, allows us to investigate the reliability implications that the choice of the development platform (Java versus OCAML) can have.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ISSRE.2005.9
Web of Science ID

WOS:000240903300010

Author(s)
Mena, Sergio  
Basile, Claudio
Kalbarczyk, Zbigniew
Schiper, André  
Iyer, Ravi K.
Date Issued

2005

Published in
Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering
Subjects

Fault injection

•

Atomic broadcast

•

Crash-stop model

•

Group communication

•

Fault tolerance

URL

URL

http://rachel.utdallas.edu/issre/
Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSR-IC  
Event nameEvent placeEvent date
16th IEEE International Symposium on Software Reliability Engineering

Chicago, USA

November 8-11, 2005

Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/220563
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés