Using register renaming and physical registers, modern microprocessors eliminate false data dependences from reuse of the instruction set defined registers (logical registers). High performance processors that have longer pipelines and a greater capacity to exploit instruction-level parallelism have more instructions in-flight and require more physical registers. Simultaneous multithreading architectures further exacerbate this register pressure.

This paper evaluates two register sharing techniques for reducing register usage. The first technique dynamically combines physical registers having the same value. The second technique combines the demand of several instructions updating the same logical register and share physical register storage among them. While similar techniques have been proposed previously, an important contribution of this paper is to exploit only special cases that provide most of the benefits of more general solutions but at a very low hardware complexity.

Despite the simplicity, our design reduces the required number of physical registers by more than 10% on some applications, and provides almost half of the total benefits of an aggressive (complex) scheme. More importantly, we show the simpler design to reduce register pressure has significant performance effects in a simultaneous multithreaded (SMT) architecture where register availability can be a bottleneck. Our results show an average of 25.7% performance improvement for an SMT architecture with 160 registers or, equivalently, similar performance as an SMT with 200 registers (25% more) but no register sharing.