Abstract

As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invari- ant is often broken in Linux. Cores may stay idle for sec- onds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold per- formance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14- 23% decrease in TPC-H throughput for a widely used com- mercial database. The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are in- effective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our in- vestigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions and run with a negligible overhead. We believe that making these tools part of the kernel developers’ tool belt can help keep this type of bugs at bay.

Details

Actions