Fua, PascalEngilberge, MartinFua, PascalEngilberge, MartinVrkic, Ivan2023-08-302023-08-302023-08-302023-08-15https://infoscience.epfl.ch/handle/20.500.14299/200325In response to the growing trend towards end-to-end learning, we propose a novel framework advancing towards an end-to-end multi-camera multi-object tracking (MC-MOT) solution that addresses challenges like occlusions, viewpoint variations, and illumination changes. Although current strategies treat detection, feature extraction, association, and trajectory generation as separate stages, our approach fuses these stages, marking a stride towards an end-to-end approach. Our method introduces a mechanism for neural association that effectively links object identities across frames and cameras, adeptly handling object appearance and disappearance. This mechanism keeps track of object identities across camera views, reducing identity switches and track losses found in prior techniques. Our method is capable of performing online, meaning it can process and update tracking information as new data is received, making it suitable for applications requiring immediate action. Through experimentation, we demonstrate that our system achieves performance levels similar to those of existing state-of-the-art (SOTA) multi-camera multi-object tracking (MC- MOT) techniques, both in accuracy and robustness, while being able to perform online. We also introduce a benchmark dataset with a tool used to create it to assess our MC-MOT framework’s capabilities. For other multi-camera systems, we suggest using object appearance through a Multi-View Retrieval setup. While fully realizing end-to-end learning remains an open challenge in MC-MOT, the proposed method’s integrated nature marks a promising step towards such a paradigm in future MC- MOT research.end-to-end learningmulti-camera multi-object tracking (MC-MOT)neural associationgraph neural networksonline trackingbenchmark datasetmulti-view retrievalLearning for Multi-View Trackingstudent work::master thesis