Unsupervised Extraction of Audio-Visual Objects

Llagostera Casanovas, Anna; Vandergheynst, Pierre

doi:10.1109/ICASSP.2011.5946938

Llagostera Casanovas, Anna; Vandergheynst, Pierre

2011

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

We propose a novel method to automatically detect and extract the video modality of the sound sources that are present in a scene. For this purpose, we first assess the synchrony between the moving objects captured with a video camera and the sounds recorded by a microphone. Next, video regions presenting a high coherence with the soundtrack are automatically labelled as being part of the source. This represents the starting point for an innovative video segmentation approach, whose objective is to extract the complete audio-visual object. The proposed graph-cut segmentation procedure includes an audio-visual term that links together pixels in regions with high audio-video coherence. Our approach is demonstrated on challenging sequences presenting non-stationary sound sources and distracting moving objects.

Details

Title Unsupervised Extraction of Audio-Visual Objects

Author(s) Llagostera Casanovas, Anna ; Vandergheynst, Pierre

Published in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing

Pages 2284-2287

Conference IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Prague, Czech Republic, May 22-27, 2011

Date 2011

Publisher Ieee Service Center, 445 Hoes Lane, Po Box 1331, Piscataway, Nj 08855-1331 Usa

Keywords

audio-visual processing; graph cuts; LTS2

DOI https://doi.org/10.1109/ICASSP.2011.5946938

Other identifier(s) View record in Web of Science

Laboratories LTS2

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LTS2 - Signal Processing Laboratory 2
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2010-10-21

Files

Abstract

Details

PDF