Techniques for Detection, Root Cause Diagnosis, and Classification of In-Production Concurrency Bugs
Concurrency bugs are at the heart of some of the worst bugs that
plague software. Concurrency bugs slow down software development
because it can take weeks or even months before developers
can identify and fix them.
In-production detection, root cause diagnosis, and classification of
concurrency bugs is challenging. This is because these activities require
heavyweight analyses such as exploring program paths and determining
failing program inputs and schedules, all of which are not
suited for software running in production.
This dissertation develops practical techniques for the detection,
root cause diagnosis, and classification of concurrency bugs for inproduction
software. Furthermore, we develop ways for developers
to better reason about concurrent programs. This dissertation builds
upon the following principles:
— The approach in this dissertation spans multiple layers of the
system stack, because concurrency spans many layers of the
system stack.
— It performs most of the heavyweight analyses in-house and resorts
to minimal in-production analysis in order to move the
heavy lifting to where it is least disruptive.
— It eschews custom hardware solutions that may be infeasible to
implement in the real world.
Relying on the aforementioned principles, this dissertation introduces:
1. Techniques to automatically detect concurrency bugs (data races
and atomicity violations) in-production by combining in-house
static analysis and in-production dynamic analysis.
2. A technique to automatically identify the root causes of in-production
failures, with a particular emphasis on failures caused
by concurrency bugs.
3. A technique that given a data race, automatically classifies it
based on its potential consequence, allowing developers to answer
questions such as “can the data race cause a crash or a
hang?”, or “does the data race have any observable effect?”.
We build a toolchain that implements all the aforementioned techniques.
We show that the tools we develop in this dissertation are
effective, incur low runtime performance overhead, and have high
accuracy and precision.
EPFL_TH6873.pdf
openaccess
1.84 MB
Adobe PDF
9763040c4d00f890e70d880a8e5418cb