This dissertation is concerned with static analysis of binary executables in a theoretically well-founded, sound, yet practical way. The major challenge is the reconstruction of a correct control flow graph in presence of indirect jumps, pointer arithmetic, and untyped variables. While static program analysis for proving safety properties or finding bugs usually targets source code, in many potential analysis scenarios only a binary is available. For instance, intellectual property issues can prevent source code from being accessible to verification specialists, and some analyses, such as malware detection, are by definition required to work with executables. Moreover, binary analysis can be useful even in situations where the source code is available, e.g., when the compiler is not part of the trusted computing base. In most of the existing work, a heuristic disassembler makes a best effort attempt to generate a plain text listing of the assembly instructions in the executable and feeds it to a separate static analysis component. The heuristics render this technique inherently unsound, and the control flow graphs retrieved from such listings are usually fragmented and incomplete. Several approaches have pointed out the possibility of using results of data flow analysis to augment disassembly and control flow reconstruction, but described this connection as suffering from a "chicken and egg" problem, since data flow analysis requires a control flow graph to work on. This dissertation argues for the integration of disassembly, control flow reconstruction, and static analysis in a unified process. It introduces a framework for simultaneous control and data flow analysis on low level binary code, which overcomes the "chicken and egg" problem and is proven to yield the most precise control flow graph with respect to the precision of the data flow domain. A very precise domain that lends itself well to control flow reconstruction is introduced in Bounded Address Tracking, a combined pointer and value analysis that supports pointer arithmetic. It tracks variable valuations up to a tunable bound on the number of values per variable per program location. Its path sensitivity generally allows strong updates to memory, i.e., heap regions are uniquely identified, and equips it with context sensitivity without assuming a correct layout of procedures. These building blocks are combined into an extensible program analysis architecture, which is implemented in a novel binary analysis tool. The tool, named Jakstab, works directly on binaries and disassembles instructions on demand while exploring the program's state space, allowing it to handle low level features such as overlapping instructions, which cause difficulties for regular disassemblers. The architecture is highly configurable to allow a wide range of analyses, from sound abstract interpretation to heuristics-supported disassembly. Its practical feasibility and improvements over existing approaches are shown through case studies on device driver binaries and system executables found on a regular desktop PC.
thesis.pdf
Publisher's version
openaccess
1.08 MB
Adobe PDF
78233600573403897487a09930675fa4