Inter-actions parallel execution on GPU from high-level dataflow synthesis
Recent GPU architectures make available numbers of parallel processing units that exceed by orders of magnitude the ones offered by CPU architectures. Whereas programs written using dataflow programming languages are well suited for programming heterogeneous systems, they might not offer sufficient parallel degrees to efficiently exploit the resources available on today’s GPUs. This paper describes how the extension of a dataflow-based approach for the synthesis of programs to be executed on mixed CPU and GPU architectures can increase the parallelism of executions on GPU. The extended approach consists of a new methodology for scheduling, in parallel, the execution of actors’ actions that uses the hardware resources available on modern GPU more efficiently. This is possible without imposing any limitation on the dataflow model of computation (MoC) of the network, all fully dynamic MoCs are supported. The paper also introduces relevant features of recent NVidia GPUs used by this approach and describes how they are used to allow dynamic reconfiguration of the execution which dynamically reconfigure the level of parallelism of actual actor execution. The paper also justifies and explains the necessity for a specially designed FIFO buffer that both preserve the dataflow computational model, and enables full parallel data accesses.