Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions. It is often due to limitations like complex architectures customized for a specific dataset and inefficient multimodal handling. We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and finegrained details. Additionally, our approach of reconstructing segment-level trajectories and lane segments from masked inputs with query drop, enables effective use of contextual information and improves generalization; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation. PerReg+ sets a state-of-the-art performance on nuScenes [5], Argoverse 2 [42], and Waymo Open Motion Dataset (WOMD) [13]. Remarkably, our model reduces the error by 6.8% on smaller datasets, and multi-dataset training enhances generalization. In cross-domain tests, PerReg+ reduces B-FDE by 11.8% compared to its non-pretrained variant.
CVPR_2025 (13).pdf
Main Document
Accepted version
openaccess
N/A
690.98 KB
Adobe PDF
21bba5bbc93b22585a89dc7cedca0e4e