Multi-party Speech Recovery Exploiting Structured Sparsity Models
We study the sparsity of spectro-temporal representation of speech in reverberant acoustic conditions. This study motivates the use of structured sparsity models for efficient recovery of speech. We formulate the underdetermined convolutive speech separation in spectro-temporal domain as the sparse signal recovery where we leverage model-based recovery algorithms. To tackle the ambiguity of the real acoustics, we exploit the Image Model of the enclosures to estimate the room impulse response function through a structured sparsity constraint optimization. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech applications.
Asaei_Idiap-RR-22-2011.pdf
openaccess
463.26 KB
Adobe PDF
e3e0438de5e0926035b9646ef70326ee