Techniques for Identifying Elusive Corner-Case Bugs in Systems Software

Modern software is plagued by elusive corner-case bugs (e.g., security bugs). Because there are no scalable, automated ways of finding them, such bugs can remain hidden until software is deployed in production. This thesis proposes approaches to solve this problem. First, we present black-box and white-box fault injection mechanisms, which allow developers to test the behavior of their own code in the presence of failures in external components, e.g., in libraries, in the kernel, or in remote nodes of a distributed system. We describe how to make black-box fault injection more efficient, by prioritizing tests based on their estimated impact. For white-box testing, we proposed and implemented a technique to find Trojan messages in distributed systems, i.e., messages that are accepted as valid by receiver nodes, yet cannot be sent by any correct sender node. We show that Trojan messages can lead to subtle semantic bugs. We used fault injection techniques to find new bugs in systems such as the MySQL database, the Apache HTTP server, the FSP file service protocol suite, and the PBFT Byzantine-fault-tolerant replication library. Testing can find bugs and build confidence in the correctness of a system. However, exhaustive testing is often unfeasible, and therefore testing may not discover all bugs before a system is deployed. In the second part of this thesis, we describe how to automatically harden production systems, reducing the impact of any corner-case bugs missed by testing. We present a framework that reduces the overhead cost of instrumentation tools such as memory error detectors. Lowering the cost enables system developers to use such tools in production to harden their systems, reducing the impact of any remaining corner-case bugs. We used our framework to generate a version of the Linux kernel hardened with Address Sanitizer. Our hardened kernel has most of the benefit of full instrumentation: it detects the same vulnerabilities as full instrumentation (7 out of 11 privilege escalation exploits from 2013-2014 can be detected using instrumentation tools). Yet, it obtains these benefits at only a quarter of the overhead.

Related material