Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. On Real-Time Multi-Stage Speech Enhancement Systems
 
conference paper

On Real-Time Multi-Stage Speech Enhancement Systems

Meng, Lingjun
•
Coldenhoff, Jozef  
•
Kendrick, Paul
Show more
2024
2024 IEEE International Conference on Acoustics, Speech, and Signal Processing : Proceedings
49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Recently, multi-stage systems have stood out among deep learning-based speech enhancement methods. However, these systems are always high in complexity, requiring millions of parameters and powerful computational resources, which limits their application for real-time processing in low-power devices. Besides, the contribution of various influencing factors to the success of multi-stage systems remains unclear, which presents challenges to reduce the size of these systems. In this paper, we extensively investigate a lightweight two-stage network with only 560k total parameters. It consists of a Mel-scale magnitude masking model in the first stage and a complex spectrum mapping model in the second stage. We first provide a consolidated view of the roles of gain power factor, post-filter, and training labels for the Mel-scale masking model. Then, we explore several training schemes for the two-stage network and provide some insights into the superiority of the two-stage network. We show that the proposed two-stage network trained by an optimal scheme achieves a performance similar to a four times larger open source model DeepFilterNet2.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICASSP48485.2024.10447228
Scopus ID

2-s2.0-105001494388

Author(s)
Meng, Lingjun

École Polytechnique Fédérale de Lausanne

Coldenhoff, Jozef  

École Polytechnique Fédérale de Lausanne

Kendrick, Paul

Logitech Europe S.A.

Stojkovic, Tijana

Logitech Europe S.A.

Harper, Andrew

Logitech Europe S.A.

Ratmanski, Kiril

Logitech Europe S.A.

Cernak, Milos

Logitech Europe S.A.

Date Issued

2024

Publisher

Institute of Electrical and Electronics Engineers

Published in
2024 IEEE International Conference on Acoustics, Speech, and Signal Processing : Proceedings
DOI of the book
https://doi.org/10.1109/ICASSP48485.2024
ISBN of the book

979-8-3503-4485-1

Start page

10241

End page

10245

Subjects

deep learning

•

multi-stage network

•

real-time

•

Speech enhancement

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
EPFL  
Event nameEvent acronymEvent placeEvent date
49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

ICASSP 2024

Seoul, South Korea

2024-04-14 - 2024-04-19

Available on Infoscience
May 8, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/249982
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés