The signal processing community is increasingly interested in using information theoretic concepts to build signal processing algorithms for a variety of applications. A general theory on how to apply the mathematical concepts of information theory to the field of signal processing would therefore be of great interest. This is one of the main goals of this thesis, namely to introduce a mathematical framework for information theoretic signal and image processing. The framework is based on stochastic processes for information transmission and on the error probabilities associated to these transmissions. Within the developed model, the stochastic processes account for the signal processing tasks within probability space, and the error probabilities are the optimization functions that drive the algorithms towards the signal processing objectives. The resulting conceptual framework allows us to directly apply a large number of information theoretic concepts and formulae to signal processing, including lower error bounds for the error probabilities or concepts from rate-distortion theory. In order to illustrate the theoretic framework, we show that several existing information theoretic signal processing algorithms implicitly fit our general model. This allows us to study interesting relationships between several algorithms. More importantly, we also apply the theory to three important target applications, namely multi-modal medical image registration, audio-video joint processing, and non-parametric, non-supervised classification. The first two applications are particular examples of the general concept of multi-modal feature extraction. Multi-modal feature extraction aims to determine those features in a pair of multi-modal signals that carry maximal mutual redundancy. This means that from the feature space representation of one signal we can predict the feature space representation of the second signal with low probability of error. After describing the mathematical basis, we illustrate the algorithm with examples of multi-modal medical image registration, where the algorithm adaptively extracts those features in the initial datasets which best perform the registration task. Again, this is done by determining those features which carry maximal mutual redundancy and therefore define optimally spatial registration. We also apply the model to audio-video signals to predict the localization of a speaker in a video scene from its corresponding speech signal. The resulting algorithms illustrate that the existence of features with large mutual redundancy in multi-modal signals can be used to improve multi-modal signal processing. Furthermore the general theory enables the construction of a wide range of completely new applications. Another illustrative example of the general information theoretic signal processing framework consists of information theoretic classification. Even though the basic model for multi-modal feature extraction and classification is identical, the final mathematical expressions are different and complementary. This allows us to make very interesting analogies between these two distinct applications. In particular, it is interesting to see that in analogy to registration, also classification algorithms aim to minimize error probabilities. The entirely probabilistic nature of the classification framework allows us to add a hidden Markov random field to the algorithms, resulting in the promising concept of non-parametric hidden Markov models. The classification algorithms are validated on synthetic and natural data. For instance, we apply the non-parametric hidden Markov model to the segmentation of medical images and obtain promising results in comparison to the state-of-the-art in this field. In conclusion, the experimental results show that the introduced mathematical framework leads to interesting generalizations of existing signal processing tasks and to promising results for several newly derived signal processing algorithms.