Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Visual Saliency Prediction for Natural Images, Comics Panels, and Comics Pages
 
doctoral thesis

Visual Saliency Prediction for Natural Images, Comics Panels, and Comics Pages

Aydemir, Bahar  
2024

Recent years have seen remarkable advancements in saliency estimation methods, mainly due to deep learning models leveraging the widespread availability of real-world images. However, saliency is profoundly shaped by the intricacies of the human visual attention system, extending beyond the mere utilization of large-scale data and powerful models. To address this, we incorporate specific characteristics of the human visual system into deep learning approaches, with the aim of improving saliency prediction. Moreover, current saliency prediction approaches do not generalize to domains characterized by limited data, such as cartoons, sketches, or comics. This challenge is pronounced by the disparity between the photographic domain and domains with sparse data. To bridge the gap between deep learning approaches and the human visual system, and to overcome the limitations of saliency prediction in the comics domain, we adopt a multifaceted approach: we model dissimilarities among objects within content-rich scenes to account for relationships between objects; we consider the temporal dynamics of attention since the attention evolves through time and we introduce a data augmentation method based on photometric alterations for saliency prediction. These methods, collectively, lead to a more precise and dynamic understanding of saliency in both natural images and comics. In the first research axis, we introduce a saliency prediction model that explicitly models the object dissimilarities in content-rich real-world photographic scenes. We calculate the size and appearance dissimilarities of the objects to fuse with the deep saliency features. We show that incorporating these dissimilarities enhances saliency prediction in natural images. In the second one, we study the temporal dimension of saliency. We make use of temporal information for improving saliency prediction since we look at different regions of an image over time. Specifically, we learn time-specific saliency predictions by exploiting temporal information. We show that the temporally evolving patterns in human attention play an important role in saliency prediction in natural images. Saliency prediction models are constrained by the limited diversity and quantity of labeled data. Standard data augmentation techniques such as cropping, and rotating change the scene composition hence affecting saliency. Therefore, we introduce a novel data augmentation method for deep saliency prediction that involves editing contrast, brightness, and color while preserving the overall structure of the scenes. This approach enables us to generate images that closely resemble the photometric characteristics of the target domains. Lastly, we analyze these methods in the domain of comics, which feature stylized elements, sequential reading, and artistic use of brightness, contrast, and color to emphasize story elements and convey emotions. We mitigate the disparities between saliency prediction in natural images and comics through our earlier contributions, which encompass object dissimilarity, temporal aspects, and adjustments to brightness, color, and contrast. In summary, we study visual attention, gaze behavior, and their estimation with deep neural networks in the context of natural images and comics. We advance our understanding of visual attention and saliency prediction, benefiting both natural images and comics, and pushing the boundaries of saliency prediction across diverse visual domains.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH9982.pdf

Type

Main Document

Version

http://purl.org/coar/version/c_be7fb7dd8ff6fe43

Access type

openaccess

License Condition

N/A

Size

93.46 MB

Format

Adobe PDF

Checksum (MD5)

e152d197c45ba9579904040249c1a8f0

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés