Alahi, Alexandre MassoudSaadatnejad, Saeed2023-09-272023-09-272023-09-27202310.5075/epfl-thesis-9876https://infoscience.epfl.ch/handle/20.500.14299/201080Forecasting is a capability inherent in humans when navigating. Humans routinely plan their paths, considering the potential future movements of those around them. Similarly, to achieve comparable sophistication and safety, autonomous systems must embrace this predictive nature. Deep generative models have played a pivotal role in advancing autonomous driving in recent years. These models are not only used in forecasting trajectory (coarse-grained) and human pose (fine-grained) but also in generating realistic synthetic images. These synthetic images, presenting intricate and diverse scenarios, provide a rigorous testing ground for evaluating the efficacy of our forecasting models. The thesis begins with generative models in trajectory forecasting. We present a novel automated assessment, an essential but as unexplored approach, to objectively evaluate the performance of forecasting models. Our proposed adversarial generation serves as an alternative for extensive real-world testing, shedding light on how state-of-the-art models can generate forecasts that violate social norms and scene constraints. Furthermore, we leverage adversarial training to enhance model robustness against adversarial attacks and improve social awareness and scene understanding. As the thesis progresses, we delve into the impact of additional visual cues that humans subconsciously exhibit when navigating space. We present a universal approach that employs the power of transformers to effectively manage diverse available visual inputs. Drawing inspiration from prompts in natural language processing, this method demonstrates improved accuracy in human trajectory forecasting by augmenting input trajectory data. Moving on to a fine-grained representation, pose forecasting, we first contribute an open-source library that includes various models, datasets, and standardized evaluation metrics, with the aim of promoting research and moving toward a unified and fair evaluation. Subsequently, we address the crucial but neglected aspect of uncertainty in forecasting. In an attempt to enhance model performance and trust, we introduce methods for incorporating prior knowledge about the uncertainty pattern in time and for quantifying uncertainty through clustering and entropy measures. In the face of real-world noisy observations, we propose a generic diffusion-based approach for pose forecasting. By framing the task as a denoising problem, our method presents significant improvement over state-of-the-art techniques across multiple datasets, under both clean and noisy conditions. Finally, the thesis journeys into the realm of realistic image synthesis, offering a semantically-aware discriminator that enriches the training of conditional generative adversarial networks. This approach enhances the traditional task of the discriminator, leading to more realistic and semantically rich image generation, thus proving useful in autonomous driving simulators. In the spirit of open-source innovation, this thesis contributes to the collective knowledge in the field of computer vision, robotics and transportation by publicly sharing our forecasting library, along with the source code and models of our work.enAutonomous DrivingMotion ForecastingDeep Generative ModelsHuman Pose PredictionHuman Trajectory PredictionAdversarial AttackDiffusion ModelsTransformersGenerative Adversarial NetworksImage SynthesisDeep Generative Models for Autonomous Driving: from Motion Forecasting to Realistic Image Synthesisthesis::doctoral thesis