The concept of stationarity is central to signal processing; it indeed guarantees that the deterministic spectral properties of linear time-invariant systems are also applicable to realizations of stationary random processes. In almost all practical settings, however, the signal at hand is just a finite-size vector whose underlying generating process, if we are willing to admit one, is unknown; In this case, invoking stationarity is tantamount to stating that a single linear system (however complex) suffices to model the data effectively, be it for analysis, processing, or compression purposes. It is intuitively clear that if the complexity of the model can grow unchecked, its accuracy can increase arbitrarily (short of computational limitations); this defines a tradeoff in which, for a given data vector, a set of complexity/accuracy pairs are defined for each possible model. In general one is interested in parsimonious modeling; by identifying complexity with "rate" and model misadjustment with "distortion", the situation becomes akin to an operational rate-distortion (R/D) problem in which, for each possible "rate", the goal is to find the model yielding t lie minimum distortion. In practice, however, only a finite palette of models is available, the complexity of which is limited by computational reasons: therefore, the entire data vector often proves too "non-stationary" for any single model. If the process is just slowly drifting, adaptive systems are probably the best choice; on the other hand, a wide class of signals exhibits a series of rather abrupt transition between locally regular portions (e.g. speech, images). In this case a common solution is to partition the data uniformly so that the resulting pieces are small enough to appear stationary with respect to the available models. However, if the goal is again to obtain an overall modeling which is optimal in the above R/D sense, it is necessary that the segmentation be a free variable in the modelization process; this is however not the case if a fixed-size time windowing is used. Note that now the reachable points in the R/D plane are in fact indexed not just by a model but by a segmentation/model-sequence pair; their number therefore grows exponentially with the size of the data vector. This thesis is concerned with the development of efficient techniques to explore this R/D set and to determine its operational lower bound for specific signal processing problems. It will be shown that, under very mild hypotheses, many practical applications dealing with nonstationary data sets can be cast as a R/D optimization problem involving segmentation, which can in turn be solved using polynomial-time dynamic programming techniques. The flexibility of the approach will be demonstrated by a very diverse set of examples: after a general overview of the various facets of the dynamic segmentation problem in Chapter 2, Chapter 3 will use the framework to determine an operational R/D bound for the approximation of piecewise polynomial function with respect to wavelet-based approximation; Chapter 4 will show its relevant to compression problems, and in particular to speech coding based on linear prediction and to arithmetic coding for binary sources; finally, in Chapter 5, an efficient data hiding scheme for PCM audio signals will be described, in which the optimal power allocation for the hidden data is determined with respect to the time-varying characteristics of the host signal.