Mutable Checkpoint-Restart: Automating Live Update for Generic Server Programs
The pressing demand to deploy software updates without stopping running programs has fostered much research on live update systems in the past decades. Prior solutions, however, either make strong assumptions on the nature of the update or require extensive and error-prone manual effort, factors which discourage live update adoption. This paper presents Mutable Checkpoint-Restart (MCR), a new live update solution for generic (multiprocess and multithreaded) server programs written in C. Compared to prior solutions, MCR can support arbitrary software updates and automate most of the common live update operations. The key idea is to allow the new version to restart as similarly to a fresh program initialization as possible, relying on existing code paths to automatically restore the old program threads and reinitialize a relevant portion of the program data structures. To transfer the remaining data structures, MCR relies on a combination of precise and conservative garbage collection techniques to trace all the global pointers and apply the required state transformations on the fly. Experimental results on popular server programs (Apache httpd, nginx, OpenSSH and vsftpd) confirm that our techniques can effectively automate problems previously deemed difficult at the cost of negligible run-time performance overhead (2% on average) and moderate memory overhead (3.9x on average).