Integrated Kernel Partitioning and Scheduling for Coarse-Grained Reconfigurable Arrays
Coarse-grained reconfigurable arrays (CGRAs) are a promising class of architectures conjugating flexibility and efficiency. Devising effective methodologies to map applications onto CGRAs is a challenging task, due to their parallel execution paradigm and constrained hardware resources. In order to handle complex applications, it is important to devise efficient strategies to partition a kernel into pieces that obey resource constraint and methodologies to schedule them on the underlying hardware. In this paper, we tackle these problems by proposing algorithms to address partitioning based on recursive searches over abstract trees. A novel scheduling strategy is also described that, leveraging differences in delays of various operations, is able to efficiently map operations on CGRA architectures. Experimental evidence on kernels derived from a diverse set of data flow graphs and EEMBC benchmarks demonstrate the efficacy of the described methods, which, when combined, achieve a higher runtime performance on a given mesh size than state-of-the-art approaches (as much as 38% for the benchmark applications considered).