Asiatici, MikhailIenne, Paolo2020-03-252020-03-252020-03-252019-01-0110.1109/FPL.2019.00049https://infoscience.epfl.ch/handle/20.500.14299/167605WOS:000518670300039The effective bandwidth of the FPGA external memory, usually DRAM, is extremely sensitive to the access pattern. Nonblocking caches that handle thousands of outstanding misses (miss-optimized memory systems) can dynamically improve bandwidth utilization whenever memory accesses are irregular and application-specific optimizations are not available or are too costly in terms of design time. However, they require a memory controller with wide data ports on the FPGA side and cannot fully take advantage of the memory interfaces with multiple narrow ports that are common on SoC FPGAs. Moreover, as their scope is limited to single memory requests, the access pattern they generate may cause frequent DRAM row conflicts, which further reduce DRAM bandwidth. In this paper, we propose DynaBurst, an extension of miss-optimized memory systems that generates variable-length bursts to the memory controller. By making memory accesses locally more sequential, we minimize the number of DRAM row conflicts, and by adapting the burst length on a per-request basis we minimize bandwidth wastage. On a multiple, narrow-ported DDR3 controller, we provide 28% geometric mean and up to 3.4x speedup compared to a traditional nonblocking cache of the same area, while the prior single-request approach would not have been cost-effective. On a controller with a single, wide port, we can further improve the performance of miss-optimized systems by up to 2.4 x.DynaBurst: Dynamically Assemblying DRAM Bursts over a Multitude of Random Accessestext::conference output::conference proceedings::conference paper