Axo: Masking Delay Faults in Real-Time Control Systems
We consider real-time control systems that consist of a controller that computes and sends setpoints to be implemented in physical processes through process agents. We focus on systems that use commercial off-the-shelf hardware and software components. Setpoints of these systems have strict real-time constraints: Implementing a setpoint after its deadline, or not receiving setpoints within a deadline, can cause failure. In this paper, we address delay faults: faults that cause setpoints to violate their real-time constraints. We present Axo, a fault-tolerance protocol that guarantees safety and improves availability for a class of such systems that exhibit two main properties: the setpoints must have a known validity horizon, and process agents must be capable of handling duplicate setpoints. To reason about delay faults, and consequently design Axo, we present an abstraction of a controller; the abstraction applies to a wide range of real-time control systems. We prove guarantees of safety and availability. Finally, we present an implementation of Axo and the results of the tests performed with Commelec, a real-time control system for electric grids.