Standard digital cameras are sensitive to radiation in the near-infrared domain, but this additional cue is in general discarded. In this paper, we consider the scene categorisation problem in the context of images where both standard visible RGB channels and near infrared information are available. Using efficient local patch-based Fisher Vector image representations, we show based on thorough experimental studies the benefit of using this new type of data. We investigate which image descriptors are relevant, and how to best combine them. In particular, our experiments show that when combining texture and colour information, computed on visible and near-infrared channels, late fusion is the best performing strategy and outperforms the state-of-the-art categorisation methods on RGB-only data.