Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs
 
research article

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

Asiatici, Mikhail  
•
Ienne, Paolo  
June 1, 2022
Acm Transactions On Reconfigurable Technology And Systems

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24x area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

  • Details
  • Metrics
Type
research article
DOI
10.1145/3466823
Web of Science ID

WOS:000764108100003

Author(s)
Asiatici, Mikhail  
Ienne, Paolo  
Date Issued

2022-06-01

Publisher

ASSOC COMPUTING MACHINERY

Published in
Acm Transactions On Reconfigurable Technology And Systems
Volume

15

Issue

2

Start page

13

Subjects

Computer Science, Hardware & Architecture

•

Computer Science

•

high performance computing

•

reconfigurable computing

•

nonblocking caches

•

dram

•

cuckoo hashing

•

irregular memory accesses

•

performance

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LAP  
Available on Infoscience
March 28, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/186583
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés