In this paper we propose a design methodology to partition dataflow applications on a multi clock domain architecture. This work shows how starting from a high level dataflow representation of a dynamic program it is possible to reduce the overall power consumption without impacting the performances. Two different approaches are illustrated, both based on the post-processing and analysis of the causation trace of a dataflow program. Methodology and experimental results are demonstrated in an at-size scenario using an MPEG-4 Simple Profile decoder.