Files

Abstract

The successes of deep learning for semantic segmentation can in be, in part, attributed to its scale: a notion that encapsulates the largeness of these computational architectures and the labeled datasets they are trained on. These resource requirements hinder the applicability of segmentation networks to scenarios where labeled data is expensive, or deployment conditions do not allow for large networks. This dissertation aims at assuaging these problems by (a) transferring the knowledge of trained networks to new domains without the need for labeled data, (b) improving the computational efficiency of segmentation transformers by a differential allocation of computation to input regions. The first part of this dissertation focuses on reducing the amount of labeled data needed to train these models by transferring knowledge from existing datasets and bridging the domain gap between them. We tackle model adaptation, a problem where we adapt a source data trained segmentation network with only unlabeled data from the target domain by improving the network's confidence of predictions. Next, we study test-time adaptation, where the goal is to adapt to a plausible domain shift with access to only a batch of samples at inference time. To do so, we train the network to be confident and stable to input perturbations. Experimental results show that methods that improve parameter or input perturbation robustness largely compensate for the absence of source data in the adaptation process. The second part of this dissertation is on the computational requirements of deep networks. We first present a method for patch pausing to improve the inference efficiency of segmentation transformers. Here, we stop processing input patches deemed to have been processed enough to produce an accurate segmentation. This determination is done by computing the network's confidence of segmentation at intermediate layers. We then focus on compute-aware evaluation methods for deep learning, focusing on optimizers. We argue that a fair assessment must include not only the performance obtained but also the cost of finding the hyperparameter configurations that result in that performance. An optimization algorithm that achieves good performance with relatively little tuning effort and computational cost is more valuable in practice than one that performs better, albeit only with more tuning. We conclude that, under our experimental setup, Adam is the most practical choice.

Details

PDF