Maximizing the data throughput is a very common implementation objective for several streaming applications. Such task is particularly challenging for implementations based on many-core and multi-core target platforms because, in general, it implies tackling several NP- complete combinatorial problems. Moreover, an efficient design space exploration requires an accurate evaluation on the basis of dataflow program execution profiling. The focus of the paper is on the methodology challenges for obtaining accurate profiling measures. Experimental results validate a many-core platform built by an array of Transport Triggered Architecture processors for exploring the partitioning search space based on the execution trace analysis.