Saliency-based visual attention models provide visual saliency by combining the conspicuity maps relative to various visual cues. Because the cues are of different nature, the maps to be combined show distinct dynamic ranges and a normalization scheme is therefore required. The normalization scheme used traditionally is an instantaneous peakto- peak normalization. It appears however that this scheme performs poorly in cases where the relative contribution of the cues varies significantly, for instance when the kind of scene changes, like when the scene under study becomes unsaturated or worse, when it looses any chromaticity. To remedy this drawback, this paper proposes an alternative normalization scheme that scales each conspicuity map with respect to a long-term estimate of its maximum, a value which is learned initially from a large number of images. The advantage of the new method is first illustrated by several examples where both normalization schemes are compared. Then, the paper presents the results of an evaluation where the computed visual saliency of a set of 40 images is compared to the respective human attention as derived from the eye movements by a population of 20 subjects. The better performance of the new normalization scheme demonstrates its capability to deal with scenes of varying type, where cue contributions vary a lot. The proposed scheme seems thus preferable in any general purpose model of visual attention.