Axo: Detection and Recovery for Delay and Crash Faults in Real-Time Control Systems

Maaz, Mashood Mohiuddin; Saab, Wajeb; Bliudze, Simon; Le Boudec, Jean-Yves

doi:10.1109/TII.2017.2772219

Maaz, Mashood Mohiuddin; Saab, Wajeb; Bliudze, Simon; Le Boudec, Jean-Yves

2018

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of the controlled resources. Recently, Axo, a protocol for masking crash and delay faults by replicating the controller, was proposed. Axo provides safety by discarding delayed setpoints, and it relies on the presence of valid setpoints for providing availability. To ensure that enough valid setpoints are issued, faulty controller replicas need to be detected and recovered. We present a mechanism for detection and recovery of delay- and crash-faulty replicas under the Axo framework. These mechanisms were designed to be soft state (i.e., their state can be reconstructed from received messages) to enable seamless additions of new replicas. Besides presenting the design, we analytically characterize the time to detect and recover a faulty replica, and we validate them experimentally. We demonstrate the performance of Axo by using two case studies: the first provides a stability analysis of an inverted pendulum system with Axo, and the second shows the fault-tolerance performance of Axo through a deployment on a real-time control system that controls a CIGRE low-voltage benchmark microgrid.

Details

Title Axo: Detection and Recovery for Delay and Crash Faults in Real-Time Control Systems

Author(s) Maaz, Mashood Mohiuddin ; Saab, Wajeb ; Bliudze, Simon ; Le Boudec, Jean-Yves

Published in IEEE Transactions on Industrial Informatics

Volume 14

Issue 7

Pages 3065-3075

Date 2018

ISSN 1551-3203

Keywords

reliability; delay faults; fault detection; fault recovery; real-time; epfl-smartgrids

DOI https://doi.org/10.1109/TII.2017.2772219

Laboratories LCA2

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LCA2 - Computer Communications and Applications Laboratory 2
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2017-11-07

Actions

Preview

Select file: