Multimodal Reranking of Content-based Recommendations for Hyperlinking Video Snippets

In this paper, we present an approach for topic-level search and hyperlinking of video snippets, which relies on contentbased recommendation and multimodal re-ranking techniques. We identify topic-level segments using transcripts or subtitles and enrich them with other metadata. Segments are indexed in a word vector space. Given a text query or an anchor, the most similar segments are retrieved using cosine similarity scores, which are then combined with visual similarity scores, computed as the distance from the anchor's visual concept vector. This approach has performed well on the MediaEval 2013 Search and Hyperlinking task, evaluated over 1260 hours of BBC TV broadcast, in terms of overall mean average precision. Experiments showed that topic-segments based on transcripts from automatic speech recognition level systems (ASR) led to better performance than the ones based on subtitles for both search and hyperlinking. Moreover, by analyzing the effect of Multimodal re-ranking on hyperlinking performance, we emphasize the merits of rich visual information available in the anchors for the hyperlinking task, and the merits of ASR for large-scale search and hyperlinking.

Presented at:
ACM International Conference on Multimedia Retrieval

 Record created 2014-04-02, last modified 2018-03-17

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)