000255905 001__ 255905
000255905 005__ 20190401203321.0
000255905 0247_ $$2doi$$a10.1109/TASLP.2018.2867081
000255905 037__ $$aARTICLE
000255905 245__ $$aDirection of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization
000255905 260__ $$c2018
000255905 269__ $$a2018
000255905 336__ $$aJournal Articles
000255905 520__ $$aConventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
000255905 6531_ $$adirection-of-arrival estimation
000255905 6531_ $$agroup sparsity
000255905 6531_ $$amonaural localization
000255905 6531_ $$anon-negative matrix factorization
000255905 6531_ $$asound scattering
000255905 6531_ $$auniversal speech model
000255905 6531_ $$aLCAV-ADPA
000255905 700__ $$0249797$$aEl Badawy, Dalia
000255905 700__ $$0244456$$aDokmanic, Ivan
000255905 773__ $$j26$$k12$$q2436-2446$$tIEEE/ACM Transactions on Audio, Speech and Language Processing
000255905 8560_ $$fdalia.elbadawy@epfl.ch
000255905 8564_ $$s2470165$$uhttps://infoscience.epfl.ch/record/255905/files/elbadawy_dokmanic_taslp2018.pdf
000255905 909C0 $$0252056$$mpaolo.prandoni@epfl.ch$$mmihailo.kolundzija@epfl.ch$$pLCAV$$xU10434
000255905 909CO $$ooai:infoscience.epfl.ch:255905$$pIC$$particle$$qGLOBAL_SET
000255905 960__ $$adalia.elbadawy@epfl.ch
000255905 961__ $$amanon.velasco@epfl.ch
000255905 973__ $$aEPFL$$rREVIEWED$$sACCEPTED
000255905 980__ $$aARTICLE
000255905 981__ $$aoverwrite