Abstract

The performance of programs written in languages following the dataflow model of computation (MoC) largely depends on the configuration (partitioning, mapping, scheduling, buffer dimensioning) chosen during the synthesis stages. Furthermore, this programming paradigm is particularly well suited for heterogeneous parallel systems because it is inherently free of memory contention and exposes parallel opportunities. Both of these statements show the necessity for a way to easily and automatically evaluate and find good design configurations. The paper describes the methodology required for clock-accurate profiling of high-level dataflow programs written in RVL-CAL when synthesized on heterogeneous CPU/GPU co-processing platforms. It also extends to the heterogeneous paradigm an existing methodology for qualitatively estimating the performance of such programs as a function of the provided configuration. This, without the need to synthesize and profile every single configuration on the actual hardware platform. This approach is validated using two application programs and several configurations.

Details