Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Stop Crying Over Your Cache Miss Rate: Handling Efficiently Thousands of Outstanding Misses in FPGAs
 
Loading...
Thumbnail Image
conference paper

Stop Crying Over Your Cache Miss Rate: Handling Efficiently Thousands of Outstanding Misses in FPGAs

Asiatici, Mikhail  
•
Ienne, Paolo  
January 1, 2019
Proceedings Of The 2019 Acm/Sigda International Symposium On Field-Programmable Gate Arrays (Fpga'19)
ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)

FPGAs rely on massive datapath parallelism to accelerate applications even with a low clock frequency. However, applications such as sparse linear algebra and graph analytics have their throughput limited by irregular accesses to external memory for which typical caches provides little benefit because of very frequent misses. Non-blocking caches are widely used on CPUs to reduce the negative impact of misses and thus increase performance of applications with low cache hit rate; however, they rely on associative lookup for handling multiple outstanding misses, which limits their scalability, especially on FPGAs. This results in frequent stalls whenever the application has a very low hit rate. In this paper, we show that by handling thousands of outstanding misses without stalling we can achieve a massive increase of memory-level parallelism, which can significantly speed up irregular memory-bound latency-insensitive applications. By storing miss information in cuckoo hash tables in block RAM instead of associative memory, we show how a non-blocking cache can be modified to support up to three orders of magnitude more misses. The resulting miss-optimized architecture provides new Pareto-optimal and even Pareto-dominant design points in the area-delay space for twelve large sparse matrix-vector multiplication benchmarks, providing up to 25% speedup with 24x area reduction or to 2x speedup with similar area compared to traditional hit-optimized architectures.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/3289602.3293901
Web of Science ID

WOS:000522383700035

Author(s)
Asiatici, Mikhail  
•
Ienne, Paolo  
Date Issued

2019-01-01

Publisher

ASSOC COMPUTING MACHINERY

Publisher place

New York

Published in
Proceedings Of The 2019 Acm/Sigda International Symposium On Field-Programmable Gate Arrays (Fpga'19)
ISBN of the book

978-1-4503-6137-8

Start page

310

End page

319

Subjects

Computer Science, Theory & Methods

•

Computer Science

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LAP  
Event nameEvent placeEvent date
ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)

Seaside, CA

Feb 24-26, 2019

Available on Infoscience
April 12, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/168140
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés