Visual Scene Understanding for Transportation: From Detecting Objects To Relationships

Alahi, Alexandre MassoudAdaimi, George2022-11-282022-11-282022-11-28202210.5075/epfl-thesis-8994https://infoscience.epfl.ch/handle/20.500.14299/192769Transportation, which deals with moving people and goods around, has a clear impact on the economic development of our society and our well-being. Traditionally, transportation was studied and analyzed using expensive sensors, such as induction loops, that are difficult to maintain. However, nowadays with the prevalence of cameras, that are inexpensive and can be easily mounted in various areas, computer vision has found its way into the transportation domain. Computer vision is a field in artificial intelligence (AI) that uses visual data to extract high-level and relevant information. While there have been several advances in computer vision, several challenges arise when dealing with vision-based transportation systems deployed in complex and uncontrolled environments. This doctoral thesis aims to introduce various vision-based deep learning methods that can handle the challenges suffered in the transportation and mobility domain and are crucial in providing a high-level understanding of a scene. We first tackle an important task in transportation, detecting different agents (e.g. vehicles and pedestrians) in diverse environments such as roads, parking lots, or sidewalks. To solve this task, we propose to leverage dense fields, referred to as Butterfly Fields, as representations to localize and classify all objects in the scene. Using dense representations enables our method to handle the challenges of object occlusion and scale variations in aerial images. Furthermore, understanding the movement of goods and people over time is critical for various transportation operations. Tracking people and objects requires re-identifying agents across images which can be challenging due to the visual ambiguity that occurs when independent traffic agents, especially vehicles, are visually-similar. Thus, we solve this challenge with a confidence-based learning framework and demonstrate a boost in performance of several re-identification methods irrespective of the type of agent, whether it is a vehicle or a person. We further show the benefit and efficacy of our detector and tracker on a common and important traffic management task. Beyond detecting and re-identifying agents in a scene, extracting the relationship between different objects is another important task in transportation. This problem can be solved using scene graph generation methods that extract a structured semantic representation of a scene by detecting the objects present and their relationships. In transportation, scene graphs are mainly used as inputs to real-time downstream decision-making tasks and thus it is important that such methods be efficient while providing good performance. Towards that end, we develop an efficient one-step scene graph generation method that provides a comprehensive understanding of a scene. Finally, since open-source is an enabler of innovation, we contribute to the collective knowledge in the field of computer vision and transportation by publicly sharing our new roundabout dataset, and the source code and models of our work.enObject DetectionAerial ImagesPerson Re-IdentificationVehicle Re-IdentificationTraffic Flow EstimationScene Graph GenerationScene UnderstandingVisual Scene Understanding for Transportation: From Detecting Objects To Relationshipsthesis::doctoral thesis