The growing complexity of digital signal processing applications makes a compelling case for the adoption of higher-level programming models such as dataflow for the implementation of applications on programmable logic devices and many/multi-core embedded processors. Past research works have shown that raising the level of abstraction of design stages does not necessarily come at penalties in results in terms of performance or resource requirements. Dataflow programs provide a high-level behavioral descriptions capable of expressing both sequential and parallel components of application algorithms and enable natural design abstractions, modularity, and portability. This paper presents an overview of the main features, recent achievements and results of a design-flow, entirely dataflow based, and the associated tools capable of implementing and optimizing complex signal processing system applications on heterogeneous and massive parallel embedded systems.