Files

Abstract

We address the question of what visual cues, including scene objects and demographic attributes, contribute to the automatic inference of perceived ambiance in social media venues. We first use a stateof- art, deep scene semantic parsing method and a face attribute extractor to understand how different cues present in a scene relate to human perception of ambiance on Foursquare images of social venues.We then analyze correlational links between visual cues and thirteen ambiance variables, as well as the ability of the semantic attributes to automatically infer place ambiance. We study the effect of the type and amount of image data used for learning, and compare regression results to previous work, showing that the proposed approach results in marginal-to-moderate performance increase for up to ten of the ambiance dimensions, depending on the corpus.

Details

PDF