Files

Abstract

We address the problem of segmenting anomalies and unusual obstacles in road scenes for the purpose of self-driving safety. The objects in question are not present in the common training sets as it is not feasible to collect and annotate examples for every possible danger on the road. Anomalies in the context of semantic segmentation are objects that do not belong to any of the predefined classes of the training set. Unusual obstacles are any objects on the road that pose a risk of collision but likewise have no available training examples. This poses a challenge for deep learning computer vision methods which generally require extensive training data. We work in the monocular image setting and rely on appearance cues alone without extra stereo or LiDAR sensors to provide a layer of safety redundancy in case the sensors are unavailable or fail. To address the difficulty posed by these constraints, we propose several specialized methods for detecting previously unseen objects. We reconstruct the input image so as to preserve the appearance in normal regions and discard anomalous ones and detect anomalies by comparing the input to the reconstruction. One of our approaches is to resynthetize the image from a semantic map which cannot represent the anomalies as they fall outside the predefined classes. In another approach we remove parts of the image and inpaint them based on the surrounding road texture which tends to remove obstacles from the road. We achieve the final detection by training a discrepancy network to distinguish the meaningful differences from reconstruction artifacts. We train the discrepancy networks without any examples of real anomalies. Instead we generate synthetic anomalies and obstacles; we alter the classes of some ground-truth objects or inject known objects onto an unusual position on the road area. We further improve the injection process so that obstacle sizes are consistent with perspective foreshortening within the scene. To this end we use a scale map encoding the apparent size of a hypothetical object at every image location. Incorporating the scale information in the the detection network guides the detection to better performance. We also study general obstacle detection without the need for specialized training. We take advantage of the attention mechanism of novel visual transformers and use Shannon entropy of the attention weights to find small self-similar regions. This approach segments objects as diverse as road obstacles, maritime hazards, aircraft seen for a bird's eye view, and moon rocks in lunar landscapes. To make our study possible, we collected, captured, and labeled examples of rare anomalies and obstacles. We also devised a comprehensive evaluation protocol for anomaly and obstacle segmentation. These efforts have culminated in the {\it Segment Me If You Can} benchmark now widely used in the field. Our efforts help improve the safety and reliability of future self-driving vehicles thanks to creative solutions to the lack of training data for rare objects. We also highlight the importance of exploring a system's limitations and failure cases, especially in a safety-critical application.

Details

PDF