Multiple object tracking is a crucial Computer Vision Task. It aims at locating objects of interest in the image sequences, maintaining their identities, and identifying their trajectories over time. A large portion of current research focuses on tracking pedestrians, and other types of objects, that often exhibit predictable behaviours, that allow us, as humans, to track those objects. Nevertheless, most existing approaches rely solely on simple affinity or appearance cues to maintain the identities of the tracked objects, ignoring their behaviour. This presents a challenge when objects of interest are invisible or indistinguishable for a long period of time.
In this thesis, we focus on enhancing the quality of multiple object trackers by learning and exploiting the long ranging models of object behaviour. Such behaviours come in different forms, be it a physical model of the ball motion, model of interaction between the ball and the players in sports or motion patterns of pedestrians or cars, that is specific to a particular scene.
In the first part of the thesis, we begin with the task of tracking the ball and the players in team sports. We propose a model that tracks both types of objects simultaneously, while respecting the physical laws of ball motion when in free fall, and interaction constraints that appear when players are in the possession of the ball. We show that both the presence of the behaviour models and the simultaneous solution of both tasks aids the performance of tracking, in basketball, volleyball, and soccer.
In the second part of the thesis, we focus on motion models of pedestrian and car behaviour that emerge in the outdoor scenes. Such motion models are inherently global, as they determine where people starting from one location tend to end up much later in time. Imposing such global constraints while keeping the tracking problem tractable presents a challenge, which is why many approaches rely on local affinity measures. We formulate a problem of simultaneously tracking the objects and learning their behaviour patterns. We show that our approach, when applied in conjunction with a number of state-of-the-art trackers, improves their performance, by forcing their output to follow the learned motion patterns of the scene.
In the last part of the thesis, we study a new emerging class of models for multiple object tracking, that appeared recently due to availability of large scale datasets - sequence models for multiple object tracking. While such models could potentially learn arbitrarily long ranging behaviours, training them presents several challenges. We propose a training scheme and a loss function that allows to significantly improve the quality of training of such models. We demonstrate that simply using our training scheme and loss allows to learn scoring function for trajectories, which enables us to outperform state-of-the-art methods on several tracking benchmarks.
EPFL_TH9203.pdf
openaccess
16.5 MB
Adobe PDF
9636e6e7c07727204ac6fd566e33cee8