Resolve: Enabling Accurate Parallel Monitoring under Relaxed Memory Models
Hardware-assisted instruction-grain monitoring frameworks provide high-coverage, low overhead debugging support for parallel programs. Unfortunately, existing frameworks are ill-suited for the relaxed memory models employed by nearly all modern processor architectures—e.g., TSO (x86, SPARC), RMO (SPARC), and Weak Consistency (ARMv7). For TSO, prior proposals hint at a solution, but provide no implementation or evaluation, and fail to correctly handle important corner cases such as byte-level dependences. For more relaxed memory models such as RMO and Weak Consistency, prior frameworks deadlock, rendering them unable to detect any bugs past the first deadlock! This paper presents Resolve, the first hardware-assisted instruction-grain monitoring framework that is complete, correct and deadlock-free under relaxed memory models. Resolve is based on the observation that while relaxed memory models can produce cycles of dependences that deadlock prior approaches, these cycles can be overcome by consulting the dataflow graph of the application threads being monitored, instead of their program order. Resolve handles all possible cycles arising in relaxed memory models, through a careful approach that uses both dataflow-based processing and versioning of monitoring state, as appropriate. Moreover, we provide the first quantitative characterization of the cycles arising under RMO, demonstrating that such cycles are prevalent and persistent, and hence deadlock is a real problem that must be addressed. Yet they are not so frequent or complex, so that Resolve’s overheads are negligible. Finally, we present a simple and novel hardware mechanism for properly synchronizing updates to monitoring state under relaxed memory models, improving performance by up to 35% over the judicious use of memory fences.