Computational Aesthetics and Image Enhancements using Deep Neural Networks
Imaging devices have become ubiquitous in modern life, and many of us capture an increasing number of images every day. When we choose to share or store some of these images, our primary selection criterion is to choose the most visually pleasing ones. Yet, quantifying visual pleasantness is a challenge, as image aesthetics not only correlate with low-level image quality, such as contrast, but also high-level visual processes, like composition and context. For most users, a considerable amount of manual effort and/or professional knowledge is required to get aesthetically pleasing images. Developing automatic solutions thus benefits a large community.
This thesis proposes several computational approaches to help users obtain the desired images. The first technique aims at automatically measuring the aesthetics quality, which benefits the users in selecting and ranking images. We form the aesthetics prediction problem as a regression task and train a deep neural network on a large image aesthetics dataset. The unbalanced distribution of aesthetics scores in the training set can result in bias of the trained model towards certain aesthetics levels. Therefore, we propose to add sample weights during training to overcome such bias. Moreover, we build a loss function on the histograms of user labels, thus enabling the network to predict not only the average aesthetics quality but also the difficulty of such predictions. Extensive experiments demonstrate that our model outperforms the previous state-of-the-art by a notable margin.
Additionally, we propose an image cropping technique that automatically outputs aesthetically pleasing crops. Given an input image and a certain template, we first extract a sufficient amount of candidate crops. These crops are later ranked according to the scores predicted by the pre-trained aesthetics network, after which the best crop is output to the users. We conduct psychophysical experiments to validate the performance.
We further present a keyword-based image color re-rendering algorithm. For this task, the colors in the input image are modified to be visually more appealing according to the keyword specified by users. Our algorithm applies local color re-rendering operations to achieve this goal. A novel weakly-supervised semantic segmentation algorithm is developed to locate the keyword-related regions where the color re-rendering operations are applied. The color re-rendering process benefits from the segmentation network in two aspects. Firstly, we achieve more accurate correlation measurements between keywords and color characteristics, contributing to better re-render rendering results of the colors. Secondly, the artifacts caused by the color re-rendering operations are significantly reduced.
To avoid the need of keywords when enhancing image aesthetics, we explore generative adversarial networks (GANs) for automatic image enhancement. GANs are known for directly learning the transformations between images from the training data. To learn the image enhancement operations, we train the GANs on an aesthetics dataset with three different losses combined. The first two are standard generative losses that enforce the generated images to be natural and content-wise similar to the input images. We propose a third aesthetics loss that aims at improving the aesthetics quality of the generated images. Overall, the three losses together direct the GANs to apply appropriate image enhancement operations.
EPFL_TH8420.pdf
openaccess
31.61 MB
Adobe PDF
2e4a51301fb415620fd16fe890689c7b