Abstract

State of the art content-based image retrieval algorithms owe their excellent performance to the rich semantics encoded in the deep activations of a convolutional neural network. The difference between these algorithms lies mostly in how activations are combined into a compact global image descriptor. In this paper, we propose to use deep feature factorization to achieve this goal. By factorizing CNN activations, we decompose an input image into semantic regions, represented by both spatial saliency heatmaps and basis vectors serving as descriptors for those regions. When combined to form a global image descriptor, our experiments show that DFF surpasses the state of the art in both image retrieval and localization of the region of interest within the set of retrieved images.

Details