Content-based video retrieval has become a very active research area in the last decade due to the increasing number of video shared on social networks such as YouTube and Daily-Motion. While most of the content-based video retrieval approaches employ visual low-level features for a global analysis of the video, this paper proposes an object-based retrieval method as an alternative. The goal of the proposed method is to retrieve those key frames and shots of a video that contain a particular object, which is a challenging task due to different viewpoints, illuminations and partial occlusions. In order to increase the reliability for 3D objects, our approach combines viewpoint-invariant region descriptors to describe the appearance of an object with a graph model to describe the spatial layout of the individual regions. Given a query object, provided by the user in form of an image and a region of interest, the system retrieves shots containing this object by analyzing a set of key frames for each shot. The robustness of our approach is demonstrated using a video in which one 3D object is recorded in from different view points and with partial occlusions.