000254995 001__ 254995
000254995 005__ 20190317000947.0
000254995 0247_ $$a10.1145/3173162.3173211$$2doi
000254995 02470 $$2DOI$$a10.1145/3173162.3173211
000254995 037__ $$aCONF
000254995 245__ $$aLTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching
000254995 260__ $$c2018
000254995 269__ $$a2018
000254995 300__ $$a14
000254995 336__ $$aConference Papers
000254995 520__ $$aGraphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file, to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure while keeping power consumption low. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp’s aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8× larger capacity and improving overall GPU performance by 31% while reducing register file power consumption by 46%.
000254995 6531_ $$aGPUs
000254995 6531_ $$aRegister File Design
000254995 6531_ $$aLatency Tolerance
000254995 6531_ $$aEnergy Efficiency
000254995 6531_ $$aMemory Technology
000254995 6531_ $$aMemory Latency
000254995 700__ $$aSadrosadati, Mohammad
000254995 700__ $$aMirhosseini, Amirhossein
000254995 700__ $$aEhsani, Seyed Borna
000254995 700__ $$aSarbazi-Azad, Hamid
000254995 700__ $$aDrumond, Mario
000254995 700__ $$aFalsafi, Babak
000254995 700__ $$aAusavarungnirun, Rachata
000254995 700__ $$aMutlu, Onur
000254995 7112_ $$dMarch 24th – March 28th, 2018$$cWilliamsburg, VA, USA$$aProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '18
000254995 773__ $$q489-502
000254995 8564_ $$uhttps://infoscience.epfl.ch/record/254995/files/ltrf-asplos18.pdf$$zFinal$$s1079218
000254995 8560_ $$fmario.drumond@epfl.ch
000254995 8564_ $$uhttps://infoscience.epfl.ch/record/254995/files/ltrf-asplos18.pdf?subformat=pdfa$$zFinal$$s2453841$$xpdfa
000254995 909C0 $$xU11837$$mbabak.falsafi@epfl.ch$$pPARSA$$0252231
000254995 909CO $$qGLOBAL_SET$$pconf$$pIC$$ooai:infoscience.epfl.ch:254995
000254995 960__ $$amario.drumond@epfl.ch
000254995 961__ $$alaurence.gauvin@epfl.ch
000254995 973__ $$aEPFL$$rREVIEWED
000254995 980__ $$aCONF
000254995 981__ $$aoverwrite