Schnarr, EricLarus, James R.2013-12-232013-12-232013-12-23199610.1109/MICRO.1996.566469https://infoscience.epfl.ch/handle/20.500.14299/98760Modern microprocessors offer more instruction-level parallelism than most programs and compilers can currently exploit. The resulting disparity between a machine's peak and actual performance, while frustrating for computer architects and chip manufacturers, opens the exciting possibility of low-cost instrumentation for measurement, simulation, or emulation. Instrumentation code that executes in previously unused processor cycles is effectively hidden. On two superscalar SPARC processors, a simple, local scheduler hid an average of 13% of the overhead cost of profiling instrumentation in the SPECINT benchmarks and an average of 33% of the profiling cost in the SPECFP benchmarks.Instruction Scheduling and Executable Editingtext::conference output::conference proceedings::conference paper