Understanding and Decoding Imagined Speech using Electrocorticographic Recordings in Humans

Certain brain disorders, resulting from brainstem infarcts, traumatic brain injury, stroke and amyotrophic lateral sclerosis, limit verbal communication despite the patient being fully aware. People that cannot communicate due to neurological disorders would benefit from a system that can infer internal speech directly from brain signals. Investigating how the human cortex encodes imagined speech remains a difficult challenge, due to the lack of behavioral and observable measures. As a consequence, the fine temporal properties of speech cannot be synchronized precisely with brain signals during internal subjective experiences, like imagined speech. This thesis aims at understanding and decoding the neural correlates of imagined speech (also called internal speech or covert speech), for targeting speech neuroprostheses. In this exploratory work, various imagined speech features, such as acoustic sound features, phonetic representations, and individual words were investigated and decoded from electrocorticographic signals recorded in epileptic patients in three different studies. This recording technique provides high spatiotemporal resolution, via electrodes placed beneath the skull, but without penetrating the cortex In the first study, we reconstructed continuous spectrotemporal acoustic features from brain signals recorded during imagined speech using cross-condition linear regression. Using this technique, we showed that significant acoustic features of imagined speech could be reconstructed in seven patients. In the second study, we decoded continuous phoneme sequences from brain signals recorded during imagined speech using hidden Markov models. This technique allowed incorporating a language model that defined phoneme transitions probabilities. In this preliminary study, decoding accuracy was significant across eight phonemes in one patients. In the third study, we classified individual words from brain signals recorded during an imagined speech word repetition task, using support-vector machines. To account for temporal irregularities during speech production, we introduced a non-linear time alignment into the classification framework. Classification accuracy was significant across five patients. In order to compare speech representations across conditions and integrate imagined speech into the general speech network, we investigated imagined speech in parallel with overt speech production and/or speech perception. Results shared across the three studies showed partial overlapping between imagined speech and speech perception/production in speech areas, such as superior temporal lobe, anterior frontal gyrus and sensorimotor cortex. In an attempt to understanding higher-level cognitive processing of auditory processes, we also investigated the neural encoding of acoustic features during music imagery using linear regression. Despite this study was not directly related to speech representations, it provided a unique opportunity to quantitatively study features of inner subjective experiences, similar to speech imagery. These studies demonstrated the potential of using predictive models for basic decoding of speech features. Despite low performance, results show the feasibility for direct decoding of natural speech. In this respect, we highlighted numerous challenges that were encountered, and suggested new avenues to improve performances.

Related material