Late Fusion of the Available Lexicon and Raw Waveform-based Acoustic Modeling for Depression and Dementia Recognition

Villatoro-Tello, EsauDubagunta, S. PavankumarFritsch, JulianRamirez-de-la-Rosa, GabrielaMotlicek, PetrMagimai-Doss, Mathew2022-09-262022-09-262022-09-262021-01-0110.21437/Interspeech.2021-1288https://infoscience.epfl.ch/handle/20.500.14299/190987WOS:000841879502005Mental disorders, e.g. depression and dementia, are categorized as priority conditions according to the World Health Organization (WHO). When diagnosing, psychologists employ structured questionnaires/interviews, and different cognitive tests. Although accurate, there is an increasing necessity of developing digital mental health support technologies to alleviate the burden faced by professionals. In this paper, we propose a multi-modal approach for modeling the communication process employed by patients being part of a clinical interview or a cognitive test. The language-based modality, inspired by the Lexical Availability (LA) theory from psycho-linguistics, identifies the most accessible vocabulary of the interviewed subject and use it as features in a classification process. The acoustic-based modality is processed by a Convolutional Neural Network (CNN) trained on signals of speech that predominantly contained voice source characteristics. In the end, a late fusion technique, based on majority voting, assigns the final classification. Results show the complementarity of both modalities, reaching an overall Macro-F1 of 84% and 90% for Depression and Alzheimer's dementia respectively.depression detectionalzheimer's diseasemental lexiconraw speechmulti-modal approachLate Fusion of the Available Lexicon and Raw Waveform-based Acoustic Modeling for Depression and Dementia Recognitiontext::conference output::conference proceedings::conference paper