Performance evaluation of acceleration of convolutional layers on OpenEdgeCGRA
Recently, efficiently deploying deep learning solutions on the edge has received increasing attention. New platforms are emerging to support the increasing demand for flexibility and high performance. In this work, we explore the efficient mapping of convolutional layers on an open-hardware, low-power Coarse-Grain Reconfigurable Array (CGRA), namely OpenEdgeCGRA. We explore both direct implementations of convolution and solutions that transform it into a matrix multiplication through an Im2col transformation and experiment with various tensor parallelism axes. We show that for this hardware target, direct convolution, coupled with weight parallelism, reaches the best latency and energy efficiency, outperforming a pure CPU implementation by 3.4× and 9.9× in terms of energy and latency, respectively.
2-s2.0-85199193537
Politecnico di Torino
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
Politecnico di Torino
École Polytechnique Fédérale de Lausanne
Politecnico di Torino
Politecnico di Torino
2024-05-07
9798400704925
67
70
REVIEWED
EPFL
| Event name | Event acronym | Event place | Event date |
Ischia, Italy | 2024-05-07 - 2024-05-09 | ||
| Funder | Funding(s) | Grant Number | Grant URL |
ACCESS | |||
Wyss Center for Bio and Neuro Engineering | |||
InnoHK | |||
| Show more | |||