The perception of a visual stimulus is strongly modulated by surrounding elements. This phenomenon, called contextual modulation, can be exemplary observed in a large number of visual illusions, e.g. in the tilt-illusion where a vertical grating appears tilted because of the context. Contextual modulation in perception is interesting because it shows that physically identical stimuli do not yield identical perceptual and neural effects. While explanations of contextual modulation are often based on local interactions within the neural machinery, recent empirical evidence casts doubt upon the explanatory value of such local approaches. This work investigates contextual modulation by measuring the performance on targets presented flanked by different contextual configurations. It was found that not the local context of the target but the global organization of the whole stimulus is the best predictor of performance. This was shown for targets presented in the fovea as well as in the periphery. These findings strongly contradict explanations based on local interactions between the target and the flankers. The results are interpreted on a level of perceptual organization as evidence for a grouping account of contextual modulation: The stronger targets and flankers are perceptually grouped, the stronger do the flankers interfere with target perception (and the weaker is performance). Furthermore, it is proposed that contextual modulation can be used as a quantitative measure to investigate the rules governing grouping of elements into meaningful wholes.