The ever increasing number of digital images in both public and private collections urges on the need for generic image content analysis systems. These systems need to be capable to capture the content of images from both scenes and objects, in a compact way that allows for fast search and comparison. Modeling images based on local invariant features computed at interest point locations has proven in recent years to achieve such capabilities and to provide a robust and versatile way to perform wide-baseline matching and search for both scene and object images. In this thesis we explore the use of local descriptors for image representation in the tasks of scene and object classification, ranking, and segmentation. More specifically, we investigate the combined use of text modeling methods and local invariant features. Firstly, our work attempts to elucidate whether a text like bag-of-visterms representation (histogram of quantized local visual features) is suitable for scene and object classification, and whether some analogies between discrete scene representations and text documents exist. We further explore the bag-of-visterms approach in a fusion framework, combining texture and color information for natural scene classification. Secondly, we investigate whether unsupervised, latent space models can be used as feature extractors for the classification task and to discover patterns of visual co-occurrence. In this direction, we show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and more robust than the bagof-visterms representation when less labeled training data is available. Furthermore, we show through aspect-based image ranking experiments, the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections. Finally, we further explore the use of the latent aspect modeling in an image segmentation task. By extending the representation resulting from the latent aspect modeling, we are able to introduce contextual information for image segmentation that goes beyond the traditional regional contextual modeling found for instance in Markov Random Field approaches.