The exploration of different design configurations of dynamic dataflow programs executed on many-core or multi-core platforms is, in general, a very difficult task. Determining a close-to-optimal partitioning, scheduling and buffer dimensioning configuration, when associated with a performance optimization function, belongs to the class of NP-complete problems. In order to explore the space of feasible solutions with efficient heuristics looking for solutions of good quality, it is important to be able to evaluate the design points in terms of the performance optimization function with sufficient precision without having to physically execute the program on the platform. This paper presents a performance estimation approach and an associated SW tool capable of exploring, with a high level of accuracy, the space of feasible solutions by using only a limited set of measurements from the physical processing platform. Moreover, the estimation model allows an identification of possible improvements that can be applied to different configurations. The results reported validate the accuracy of the methodology using examples of dataflow implementations of dynamic video codec designs for two different classes of platforms: Transport Triggered Architecture and Intel platforms.