The problem of clustering in urban traffic networks has been mainly studied in static framework by considering traffic conditions at a given time. Nevertheless, it is important to underline that traffic is a strongly time-variant process and it needs to be studied in the spatiotemporal dimension. Investigating the clustering problem over time in the dynamic domain is critical to better understand and reveal the hidden information during the process of congestion formation and dissolution. The primary motivation of the paper is to study the spatiotemporal relation of congested links, observing congestion propagation from a macroscopic perspective, and finally identifying critical pockets of congestion that can aid the design of peripheral control strategies. To achieve this, we first introduce a static clustering method to partition the heterogeneous network into homogeneous connected sub-regions. The proposed framework guarantees connectivity of the cluster in different steps, which eases the development of a dynamic framework. The proposed clustering approach has 3 steps; firstly, it obtains a set of homogeneous connected components in the network. Each component has a form of sequence which is built by sequentially adding neighboring links with similar level of congestion. Secondly, the major skeleton of clusters is obtained out of this feasible set by minimizing a heterogeneity index. Thirdly, a fine-tuning step is designed to complete the clustering task by assigning the unclustered links of the network to proper clusters while keeping the connectivity. The optimization problem in both second and third step is formulated as a mixed integer linear programming. The approach is also extended to capture spatiotemporal growth and formation of congestion. The dynamic clustering is based on an iterative and fast procedure that considers the spatiotemporal characteristics of congestion propagation and identifies the links with the highest degree of heterogeneity due to time dependent conditions and finally recluster them to guarantee connectivity and minimize heterogeneity. An implementation of the developed methodologies in a mega-city based on more than 20,000 taxis with GPS highlights the quality of the method due to its fast computation and proper integration of physical properties of congestion.