HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

Gao, Mingyu; Kozyrakis, Christos

doi:10.1109/HPCA.2016.7446059

conference paper

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

Gao, Mingyu

•

Kozyrakis, Christos

2016

Proceedings Of The 2016 Ieee International Symposium On High-Performance Computer Architecture (Hpca-22)

22nd IEEE International Symposium on High-Performance Computer Architecture (HPCA)

The energy constraints due to the end of Dennard scaling, the popularity of in-memory analytics, and the advances in 3D integration technology have led to renewed interest in near-data processing (NDP) architectures that move processing closer to main memory. Due to the limited power and area budgets of the logic layer, the NDP compute units should be area and energy efficient while providing sufficient compute capability to match the high bandwidth of vertical memory channels. They should also be flexible to accommodate a wide range of applications. Towards this goal, NDP units based on fine-grained (FPGA) and coarse-grained (CGRA) reconfigurable logic have been proposed as a compromise between the efficiency of custom engines and the flexibility of programmable cores. Unfortunately, FPGAs incur significant area overheads for bit-level reconfiguration, while CGRAs consume significant power in the interconnect and are inefficient for irregular data layouts and control flows. This paper presents Heterogeneous Reconfigurable Logic (HRL), a reconfigurable array for NDP systems that improves on both FPGA and CGRA arrays. HRL combines both coarse-grained and fine-grained logic blocks, separates routing networks for data and control signals, and uses specialized units to effectively support branch operations and irregular data layouts in analytics workloads. HRL has the power efficiency of FPGA and the area efficiency of CGRA. It improves performance per Watt by 2.2x over FPGA and 1.7x over CGRA. For NDP systems running MapReduce, graph processing, and deep neural networks, HRL achieves 92% of the peak performance of an NDP system based on custom accelerators for each application.

Type

conference paper

DOI

10.1109/HPCA.2016.7446059

Web of Science ID

WOS:000381808200011

Authors

Gao, Mingyu

•

Kozyrakis, Christos

Publication date

2016

Publisher

Ieee

Published in

Proceedings Of The 2016 Ieee International Symposium On High-Performance Computer Architecture (Hpca-22)

ISBN of the book

978-1-4673-9211-2

Publisher place

New York

Total of pages

12

Series title/Series vol.

International Symposium on High-Performance Computer Architecture-Proceedings

Start page

126

End page

137

Peer reviewed

REVIEWED

EPFL units

SAIL

Event name	Event place	Event date
22nd IEEE International Symposium on High-Performance Computer Architecture (HPCA)	Barcelona, SPAIN	MAR 12-16, 2016

Available on Infoscience

October 18, 2016

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/130041