Infoscience

Thesis

Image/video representation and scalable coding using redundant dictionaries

Compact or efficient representation for either images or image sequences is key operation to performing image and video processing tasks, such as compression, analysis, etc. The efficiency of an approximation is evaluated by the sparsity measure of the approximation, i.e., the sparsest the representation is, the more efficient it is. For image processing tasks, it is often desired to decompose the image into a linear combination of few visual primitives or features selected from a large collection of waveforms, called the dictionary. These primitives are usually designed in such a way to achieve some good approximation performances by assuming a given class of functions to model images. A common class of functions for image modeling is the set of functions having discontinuities along smooth contours or boundaries, delineating smooth geometrical regions. It was shown that the well-known separable isotropically scaled two-dimensional wavelets fail to capture the geometrical regularity inherent to images. Motivated by this issue, we designed two rich dictionaries Ds and Dst, composed of multi-scaled ridge-like functions satisfying the anisotropy and the directionality features, for image and video expansions, respectively. Regarding the spatio-temporal dictionary Dst, we defined a warping operator W, which aligns these functions along the coherent motion trajectories in order to exploit the nature of the temporal evolution in the video signal. To obtain a compact representation for the visual signal over the dictionaries Ds and Dst, only the primitives that best match the signal components must be selected. This problem is well studied in approximation theory and it is known as the problem of sparse approximation in unrestricted dictionaries, whose optimal solution is NP-hard. Greedy approaches, such as MP and OMP, have been proposed to provide sub-optimal solutions by iteratively choosing one atom at a time. However, the computational complexity of these algorithms is cumbersome and limits their applicability. To alleviate this issue, we introduced a greedy algorithm, called the M-Term Pursuit MTP, whose performances are very close to those related to MP. Then we employed both algorithms, MP and MTP, for image and image sequence decompositions and we investigated their performances. A target application has been studied, which is the scalable compression, where the aim is to generate a bit-stream that can provide both SNR and geometrical scalability. Geometrical scalability refers to the spatial scalability in case the of images and to the spatio-temporal scalability in the case of video sequences. The SNR or quality scalability is related mainly to the nature of the greedy approaches and the embedded quantization and coding. To do so, we designed a rate allocation algorithm that offers a progressively refinable bit-stream, based on the subsets approach. On the other hand, the geometrical scalability is fullfiled thanks to the structure of the dictionaries, Ds and Dst, which are designed using parametric fashion. Comparisons with state-of-the-art scalable and non-scalable codecs illustrate the good performance of the proposed compression techniques at low rates.

    Thèse École polytechnique fédérale de Lausanne EPFL, n° 3316 (2005)
    Section de génie électrique et électronique
    Faculté des sciences et techniques de l'ingénieur
    Institut de traitement des signaux
    Laboratoire de traitement des signaux 2
    Jury: Pascal Frossard, Béatrice Pesquet-Popescu, Peter Schelkens, Jean-Philippe Thiran

    Public defense: 2005-8-12

    Reference

    Record created on 2005-07-12, modified on 2016-08-08

Fulltext

Related material