Structured pruning for efficient systolic array accelerated cascade Speech-to-Text Translation

Rouas, Jean-Luc; Brazier, Charles; Letaifa, Leila; Medina, Rafael; Palacios, Pedro; Atienza, David; Ansaloni, Giovanni

conference paper not in proceedings

Rouas, Jean-Luc

•

Brazier, Charles

•

Letaifa, Leila

2025

26th Interspeech Conference 2025

We present in this paper a simple method for pruning tiles of weights in sparse matrices, that do not require fine-tuning or retraining. This method is applied here to the feed-forward layers of transformers. We assess in a first experiment the impact of such pruning on the performances of speech recognition, machine translation, and the cascaded speech-to-text translation, on the MuST-C database, for the English to French direction. Depending on the size of the pruned tiles (from 4x4 to 32x32), we observe that pruning rates from 15 to 40% for speech recognition and from 40 to 70% for machine translation are feasible for a performance degradation of 10%. Applying this pruning method to the systolic array accelerated version of the cascade speech-to-text translation system results in speedups up to 74x compared to the non-accelerated system. Energy consumption also benefits from structured pruning with a maximum reduction of 35%.

Name

Structured pruning for efficient systolic array accelerated cascade Speech-to-Text Translation.pdf

Type

Main Document

Version

Accepted version

Access type

openaccess

License Condition

N/A

Size

1.34 MB

Format

Adobe PDF

Checksum (MD5)

7978ae8d65bc1ee06ab01cc621cb37cf