Files

Abstract

Self-driving cars and delivery robots are set to shape the future of transportation, but they still have to learn how to co-exist with humans in close proximity. Autonomous systems need to detect pedestrians and understand the meaning of their actions before making appropriate decisions in response. Action recognition is therefore an essential task for transportation applications, and yet very challenging, as there is no control over the distances of pedestrians or the real-world variations like lighting, weather, and occlusions. In this paper, we focus on the action recognition task in the context of transportation applications and deal with real-world variations and challenging scenarios by representing humans through their 2D poses. Representing human postures as sparse sets of keypoints allows focusing on essential details while providing invariance to many factors, including background scenes, lighting, textures, and clothes. However, keypoints’ greatest strength is also their main weakness, as such a low-dimensional representation risks neglecting other essential elements in a scene. We propose a simple approach using keypoints as intermediate representations and aim to shed light on which tasks keypoints are effective representations for. We conduct experiments on two datasets related to autonomous driving: TCG and TITAN.

Details

PDF