An important challenge for a dataflow designer is to efficiently explore the design space in order to find a set of configurations that satisfy the defined objective function. The exploration directions may involve the partitioning, scheduling and buffer dimensioning, and all together should drive the designer to maximally benefit from the potential parallelism of an application. Successful exploration can be strongly facilitated by means of performance estimation. This paper presents a tool that allows a high-precision estimation of a program execution on a given platform, when various sets of configurations can be applied. It demonstrates which information related to the multi-core program execution can be extracted and successfully used to drive the optimization procedures. The experimental results are confirmed by an actual execution on different types of platforms.