Learning and leveraging shared domain semantics to counteract visual domain shifts
One of the main limitations of artificial intelligence today is its inability to adapt to unforeseen circumstances. Machine Learning (ML), due to its data-driven nature, is particularly susceptible to this. ML relies on observations in order to learn implicit rules about the inputs, outcomes, and relationships among them, so as to solve a task. An unfortunate consequence of learning based on observations, however, is that ML algorithms learn by observing a partial, inevitably skewed version of the world. As a result, ML methods experience difficulties deciding how to properly use their learned experience when the world as they know it changes.
Domain adaptation, the paradigm followed in this thesis, is an area of ML research that tackles the above problem. In the domain adaptation setup, a ML problem must be adapted when domain shiftsâ or changes in the nature of the dataâ occur. This thesis tackles the domain adaptation problem in the particular context of visual applications, with the ultimate goal of reducing as much as possible the need for human intervention in training ML methods for visual tasks. We study the domain adaptation problem from two different fronts.
A first idea is to harness the existing structure of the images. Despite visual differences, structural information related to the semantic content of the image is often preserved in images from different origins. We present a method to leverage those visual correspondencesâ even when the match is imperfectâ based on Multiple Instance Learning, so as to adjust parameters of a ML model to new domains. We also introduce a Self-Supervised ML method that aggregates visual correspondences into a consensus heatmap. As such heatmaps are good unsupervised proxies for real annotations, we use them as supervisory signal. In addition, we propose a Two-Stream U-Net architecture that processes different domains simultaneously. The Two-Stream U-Net combines parameter regularization, distribution matching, and the self-supervised signal from the consensus heatmaps, to bridge the performance gap between models operating on different image domains.
The second line of reasoning looks beyond the raw image information and instead maps images to compact latent representations that preserve image semantics. For this, we introduce multiflow networks: a neural Network Architecture Search paradigm that assigns different network capacity to different image domains to extract domain-agnostic latent representations. In the multiflow formalism, domain-specific learnable gates modulate the contribution of different operations to the encoding. The end results are latent encodings from different domains that do not suffer from the domain shift.
Our results in biomedical image segmentation, object classification, and object detection, validate the wide range of applicability of the methods introduced in this thesis.
EPFL_TH10000.pdf
openaccess
53.21 MB
Adobe PDF
b910ae0eb0ad182503522f88df6ec8a6