Large dataflow designs are a result of behavioral specification of modern complex digital systems and/or a result of unfolding and transforming looped and branched programs. Since deep-submicron silicon technology provides large amounts of available resources, pipelining optimization without (or with minimal) resource sharing can give significant advantages in performance. High-level synthesis of CAL-programs is particularly popular in computation intensive applications (e.g., image and video processing, cryptography, wireless communication, etc.) where feedback actors with data flows at input and output ports represent loop-like behavior. In this work, we propose techniques for transforming, analysis, speculatively pipelining and optimizing large branched feedback dataflow programs. We develop an accurate algorithm and introduce fast dynamic and mixed static / dynamic heuristics that firstly minimize the number of pipeline stages for a given pipeline-stage time-period, and secondly minimize the overall pipeline registers size by means of appropriate assignment of feedbacks and instructions to pipeline stages. We also propose a genetic algorithm for tuning the heuristics for a particular design. The experimental results show the algorithms we propose give quickly solutions that are very close to accurate solutions and overcomes the earlier developed algorithms regarding computing time and pipeline parameters.