Back to the Future: Fault-tolerant Live Update with Time-traveling State Transfer

Live update is a promising solution to bridge the need to frequently update a software system with the pressing demand for high availability in mission-critical environments. While many research solutions have been proposed over the years, systems that allow software to be updated on the fly are still far from reaching widespread adoption in the system administration community. We believe this trend is largely motivated by the lack of tools to automate and validate the live update process. A major obstacle, in particular, is represented by state transfer, which existing live update tools largely delegate to the programmer despite the great effort involved. This paper presents time-traveling state transfer, a new automated and fault-tolerant live update technique. Our approach isolates different program versions into independent processes and uses a semantics-preserving state transfer transaction—across multiple past, future, and reversed versions—to validate the program state of the updated version. To automate the process, we complement our live update technique with a generic state transfer framework explicitly designed to minimize the overall programming effort. Our time-traveling technique can seamlessly integrate with existing live update tools and automatically recover from arbitrary run-time and memory errors in any part of the state transfer code, regardless of the particular implementation used. Our evaluation confirms that our update techniques can withstand arbitrary failures within our fault model, at the cost of only modest performance and memory overhead.

Published in:
Proceedings of the 27th USENIX Large Installation and System Administration Conference, 89-104
Presented at:
27th USENIX Large Installation and System Administration Conference, Washington, D.C., USA, November 3-8, 2013
The USENIX Association
Awarded Best Student Paper.

 Record created 2017-04-22, last modified 2018-03-17

External link:
Download fulltext
Rate this document:

Rate this document:
(Not yet reviewed)