LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching

Sadrosadati, Mohammad; Mirhosseini, Amirhossein; Ehsani, Seyed Borna; Sarbazi-Azad, Hamid; Drumond, Mario; Falsafi, Babak; Ausavarungnirun, Rachata; Mutlu, Onur

doi:10.1145/3173162.3173211

Sadrosadati, Mohammad; Mirhosseini, Amirhossein; Ehsani, Seyed Borna; Sarbazi-Azad, Hamid; Drumond, Mario; Falsafi, Babak; Ausavarungnirun, Rachata; Mutlu, Onur

2018

Télécharger

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Résumé

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file, to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure while keeping power consumption low. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp’s aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8× larger capacity and improving overall GPU performance by 31% while reducing register file power consumption by 46%.

Détails

Titre LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching

Auteur(s) Sadrosadati, Mohammad ; Mirhosseini, Amirhossein ; Ehsani, Seyed Borna ; Sarbazi-Azad, Hamid ; Drumond, Mario ; Falsafi, Babak ; Ausavarungnirun, Rachata ; Mutlu, Onur

Publié dans Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '18

Pagination 14

Pages 489-502

Présenté à Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '18, Williamsburg, VA, USA, March 24th – March 28th, 2018

Date 2018

Mots-clés (libres)

GPUs; Register File Design; Latency Tolerance; Energy Efficiency; Memory Technology; Memory Latency

DOI https://doi.org/10.1145/3173162.3173211

Autres identifiant(s) DOI: https://doi.org/10.1145/3173162.3173211

Laboratoires PARSA

Le document apparaît dans Production scientifique et compétences > I&C - Faculté Informatique & Communications > IINFCOM > PARSA - Laboratoire d'architecture de systèmes parallèles
Publications validées par des pairs
Papiers de conférence
Travail produit à l'EPFL

Date de création de la notice 2018-04-18

Files

Résumé

Détails

PDF