We present a novel approach for contextual segmentation of complex visual scenes, based on the use of bags of local invariant features (visterms) and probabilistic aspect models. Our approach uses context in two ways: (1) by using the fact that specific learned aspects correlate with the semantic classes, which resolves some cases of visual polysemy, and (2) by formalizing the notion that scene context is image-specific -what an individual visterm represents depends on what the rest of the visterms in the same bag represent too-. We demonstrate the validity of our approach on a man-made vs. natural visterm classification problem. Experiments on an image collection of complex scenes show that the approach improves region discrimination, producing satisfactory results, and outperforming a non-contextual method. Furthermore, through the later use of a Markov Random Field model, we also show that co-occurrence and spatial contextual information can be conveniently integrated for improved visterm classification.