With the technological evolution of digital acquisition and content analysis, millions of images and video sequences are captured every day and used in a large variety of applications. As keyword-based indexing is very time consuming and inefficient due to linguistic and semantic ambiguities, content-based image and video retrieval systems have been proposed, which search and retrieve documents based on the content itself rather than its associated tags or keywords. Within such systems, a query document is usually compared to all the documents in a database through visual features extracted from it. However, since the features are extracted from images which contain two-dimensional projections of three-dimensional objects, the features may change significantly depending on the view point. Thus, systems could fail to retrieve relevant content in response to some queries.