In this thesis, a novel method for the segmentation of video sequences based on the analysis of multiple image features is presented. A key feature of the system is the distinction between two levels of segmentation, namely region and object segmentation. Regions are homogeneous areas of the images, which are extracted automatically by the computer. Semantically meaningful objects are obtained by grouping regions, automatically or through user interaction, according to the specific application. This splitting relieves the computer of ill-posed semantic problems, and allows a higher level of flexibility in the use of the results. The extraction of the regions is based the multidimensional analysis of several image features by a spatially constrained Fuzzy C-Means algorithm. The relative weighting of the different features is achieved by means of an adaptive system that takes into account the local level of reliability of each feature. The temporal tracking of the obtained regions is performed by means of a dual strategy in which the motion-compensated projection of the segmentation mask from previous frames is used to influence the segmentation of the current frame so as to achieve higher temporal coherence and stability.