Spatial Artificial Intelligence (spatial AI) is a field dedicated to enabling machines to perceive, understand, and interact with the physical world in 3D. This field is pivotal in bridging the gap between digital systems and the physical world. This thesis focuses on 3D interaction in Spatial AI, which is pivotal to applications such as robotic manipulation, virtual reality, and augmented reality. Central to 3D interaction is object pose estimation that determines the 3D object translation and 3D object orientation from visual input. In real applications, spatial AI systems are often deployed in dynamic, diverse, and unstructured environments, which demand algorithms that are both robust and capable of generalization. However, most existing object pose estimation methods operate at the instance level, restricting pose estimation to the same object instances during both training and testing. These methods become inapplicable in scenarios where previously new objects exist during testing. Therefore, this thesis addresses image-based pose estimation for previously unseen objects, aiming to develop methods generalizable to new objects. A key insight is that generalizable object pose estimation inherently relies on a reference, which plays a crucial role in both object identification and the definition of a canonical coordinate system. In this thesis, we investigate pose estimation under three different reference formulations: dense-view reference images, sparse-view reference images, and a single reference image.
EPFL_TH11262.pdf
Main Document
Not Applicable (or Unknown)
openaccess
N/A
16.18 MB
Adobe PDF
fb846831e66a5ef18672c0822f54baf1