Event-related causality in Stereo-EEG discriminates syntactic processing of noun phrases and verb phrases

Syntax involves complex neurobiological mechanisms, which are difficult to disentangle for multiple reasons. Using a protocol able to separate syntactic information from sound information we investigated the neural causal connections evoked by the processing of homophonous phrases, either verb phrases (VP) or noun phrases (NP). We used event-related causality (ERC) from stereo-electroencephalographic (SEEG) recordings in 10 epileptic patients in multiple cortical areas, including language areas and their homologous in the non-dominant hemisphere. We identified the different networks involved in the processing of these syntactic operations (faster in the dominant hemisphere) showing that VPs engage a wider cortical network. We also present a proof-of-concept for the decoding of the syntactic category of a perceived phrase based on causality measures. Our findings help unravel the neural correlates of syntactic elaboration and show how a decoding based on multiple cortical areas could contribute to the development of speech prostheses for speech impairment mitigation.

Traditionally, language is analyzed in relation to four main components: the acoustic level, that is the 46 physical medium humans naturally exploit to convey information and its articulatory-phonatory 47 counterpart; the lexicon, which is the repertoire of words expressing predicative contents and logical 48 instructions; syntax, the set of principles to assemble larger units (phrases) from lexical items , in a 49 recursive potentially infinite way; semantics, an interpretative component which captures the truth value 50 conditions for each syntactic structure. However, since the acoustic and syntactic information are 51 crucially intertwined (Ding et al., 2015), even during inner speech (Kayne, 2019;Magrassi et al., 2015), 52 isolating syntax at the electrophysiological level appears to be an insurmountable empirical task. This is 53 reflected in the difficulty of developing specific syntax-related tasks for experimental studies of language 54 neurobiology and it is responsible for the relatively limited knowledge of syntax-related processing in the 55 brain. Understanding the neural correlates of even the most basic syntactic operations, such as merging 56 an article with a noun (N) yielding a Noun Phrase (NP) or a pronoun with a verb (V) yielding a Verb 57 Phrase (VP) remains a crucial challenge for brain and language research (Grodzinsky & Friederici, 2006). 58 59 In a recent study (Artoni et al., 2020), we designed and used a novel protocol aimed at isolating syntactic 60 information from the acoustic associated information by exploiting pairs of sentences containing 61 homophonous strings (same acoustic information but completely different syntactic content). 62 Specifically, each pair of stimuli contained the same acoustic copy of two homophonous words, which 63 could be interpreted either as a Noun Phrase (NP) or a Verb Phrase (VP) ( Figure 1A). This approach 64 was used to factor out any phonological and prosodical clue in a complete way, even at the subliminal 65 level. We used this protocol while recording the related cortical activation using stereo-electroencephalo-66 graphy (SEEG), an invasive recording technique with unparalleled signal-to-noise ratio and recording 67 band-width (He et al., 2019;Lachaux et al., 2003). 68 69 Here, we exploited the same dataset to investigate the amplitude, the direction, and the specific 70 frequencies of the interactions taking place between brain structures, that is the collection of causal links 71 elicited by different functional situations known as effective connectivity (Penny et al., 2004). Given the 72 utmost importance of timing, here we analyze the directed connectivity patterns elicited by a stimulus, 73 i.e., the event-related causality (ERC). We investigated the dynamical evolution of the causal integration 74 in response to a specific part of the time-varying stimuli (sentences) -the response window (RW) -either 75 the NP or the VP. To reach this aim and to characterize and define the different networks involved in 76 the processing of the syntactic operations yielding a Noun Phrase or a Verb Phrase we used a recently 77 validated pipeline of ours for the evaluation of ERC in a set RW (Cometa et al., 2021). 78 We also present a proof-of-concept for the decoding of the syntactic category of a perceived sentence 79 based on causality measures which could contribute to the future development of speech prostheses for 80 speech impairment mitigation. 81 82 3 83 84

85
NPs and VPs elicit two unique networks 86 The neural networks elicited by the processing of NPs and VPs were investigated with SEEG. The data 87 were recorded from 10 Italian-native speaker patients with no language disorders who underwent surgical 88 operation for drug-resistant epilepsy. NPs and VPs were encoded in the same acoustic stimulus and could 89 be differentiated only by their syntactic context (some Italian homophonous phrases, such as la porta /la 90 ˈpɔrta/ -that can be interpreted either as a noun phrase -"the door" -or a verb phrase -"[s/he] brings 91 her"). After pre-processing, close recording contacts were arranged in groups called mini-regions of 92 interest (mini-ROIs), each represented by a prototypical contact. The grouping resulted in a total of 396 93 mini-ROIs in the left -or dominant -hemisphere (DH) and 577 mini-ROIs in the right -or non-94 dominant -hemisphere (NDH) ( Figure 1B). We restricted the analysis to connections identified within the ultra-high gamma frequency band (150 to 100 300 Hz) (Artoni et al., 2020). The pipeline discovered 13 significant connections for the NP case (2 in 101 the DH and 11 in the NDH) and 20 connections for the VP condition (6 in the DH, 13 in the NDH, 102 and 1 from the right temporal lobe to the left temporal lobe). We observed 4 connections active for both 103 phrases in the NDH. Of these shared connections 3 were intra-temporal ( Figure 2A). All the significant 104 connections are shown in Table S1. 105 106 We compared the estimated connections with the recorded cortico-cortical evoked potentials (CCEPs) 107 (Matsumoto & Kunieda, 2019), which are an indicator of the presence of a direct cortico-cortical or 108 cortico-subcortico-cortical anatomical pathway (Matsumoto et al., 2004). Out of all the pairs of channels 109 with a significant connection, only 11 exhibited a CCEP. The contacts involved in a significant 110 connection and with a relevant CCEP were placed closer together than those not showing CCEPs (Mann-111 Whitney U22,11 = 53, p < 0.005) ( Figure 2B). 112 Significant connections may be biased by clusters of closely placed contacts. Thus, to factor out a possible 113 effect of this spatial sampling bias, we compared the distribution of the distances between pairs of 114 contacts showing significant causal connections with the distribution of the distances between all 115 channels ( Figure 2C). We did not detect any difference between the two distributions (Mann-Whitney 116 U29,47987 = 590819, p = 0.16). 117 Finally, more significant connections in both NPs and VPs were found in subjects with electrodes placed 118 in the NDH, in contrast to those with the DH explored (Mann-Whitney U4,5 = 18.5, p < 0.05, Figure  119 2D). This difference was still present even when normalizing the number of significant directed 120 connections by the total amount of the possible connections for each subject (Mann-Whitney U4,5 = 18, 121 p < 0.05). Only one subject had both hemispheres explored and showed an inter-hemispheric connection 122 (VP, from the right temporal lobe to the left one). 123 VPs engage a wider network than NPs

124
The recording contacts participating in the NP-related network or the VP-related network were not 125 spread across the entire cortical surface but rather clustered in specific brain zones -i.e. the anatomical 126 parcellation of cortical gyri and sulci according to the Destrieux atlas (Destrieux et al., 2010). In total, 64 127 brain zones were probed in the DH and 88 in the NDH. Out of 152 cortical areas, 11 were involved in 128 the processing of both homophonous phrases (2 in the DH and 9 in the NDH), 12 participated in the 129 processing of the VPs alone (6 in the DH and 6 in the NDH) and 6 responded exclusively to NPs (1 in 130 the DH and 5 in the NDH) ( Figure 2E). 131 4 The connectivity estimated by the PDC is a directed causal information flow from one recording contact 132 called source to another denoted sink. For NPs, all the sources were located bilaterally in the temporal 133 lobes (2 in the DH and 11 in the NDH). For VPs, the temporal lobes contained 17 sources (5 in the DH 134 and 12 in the NDH). The other 3 VPs sources were situated in the right occipital lobe, right frontal lobe, 135 and left insula ( Figure 2F, left). Most sinks, for both NPs and VPs, were in the two temporal lobes (DH: 136 2 for NPs and 4 for VPs; NDH: 6 for NPs and 8 for VPs). Other sinks were in the right insula (1 for 137 NPs, 2 for VPs), in the right frontal lobe (2 for NPs, 1 for VPs), right central lobe (1 for NPs), right 138 cingulum (1 for NPs, 2 for VPs), left frontal lobe (2 for VPs), and left cingulum (1 for VPs) ( Figure 2F, 139 right). The lists of the cortical areas containing sources and sinks for a given connection are shown in 140 Table S2 and Table S3. 141 Overall, VPs elicited more sources or sinks than NPs, engaged a higher number of different cortical areas 142 in both hemispheres, with almost no brain-zone being more active for NPs. 143 The results show that VPs extended the processing network beyond the temporal lobes. 144 Recording contacts that participated in VPs processing seemed to be located further than those involved 145 in NPs processing (Mann-Whitney U13,20 = 93, p = 0.08, Figure 2G), even if not reaching the statistical 146 significance level α = 0.05. 147 Syntax processing is faster in the DH 148 We then looked at the speed of response, or processing time, in the DH and NDH. The latencies of the 149 peaks in the temporal evolutions of the time-varying significant causalities were thus compared among 150 hemispheres. We considered only the highest peak, for each time series, occurring during the 151 homophonous part of the stimuli ( Figure 3A). These peaks arose earlier in the DH (Mann-Whitney U8,24 152 = 54.5, p < 0.05), for both NPs and VPs ( Figure 3B).

153
The peak latencies in the directed connections evoked by the homophonous syntagms did not correlate 154 linearly with the distances between the recording contacts involved in those connections (Pearson's ρ = 155 0.07, p = 0.71, Figure 3C). Moreover, distances between recording contacts implanted in the DH and 156 NDH and participating in an active connection were not statistically different (Mann-Whitney U8,24 = 83, 157 p = 0.29). Therefore, the difference in peak latencies was likely not due to the channel distribution in the 158 two hemispheres, but rather solely to the syntactic processing time. 159 Connectivity decodes homophonous phrases 160 The general neural connectivity estimated by the time-varying PDC was able to determine if the subject 161 was waiting for the sentence (baseline), listening to the initial part of the sentence, to the homophonous 162 phrase (RW), or its ending. We used a Long Short-Term Memory Network (LSTM) (Hochreiter & 163 Schmidhuber, 1997) to classify the stimulus segments with single-trial accuracy equal to 83.75 % ( Figure  164 4A). 165 We finally extracted time-dependent features only on the identified significant connections. We used a 166 Support Vector Machine (SVM) (Cortes & Vapnik, 1995) to predict the syntactic content of the 167 homophonous phrase in the sentence. The accuracy was significantly above chance during the RW phase 168 ( Figure 4B).

169
Both models were evaluated using a Leave-One-Subject-Out (LOSO) cross-validation. Language comprehension and production, in particularly syntax processing, are complex and highly 174 integrated tasks continuously carried out by our brain, seemingly without effort. Analysing their neural 175 correlates thus requires sophisticated tools. One of the most promising techniques to identify the 176 different neural processes underlying the syntactic operations leading to the processing of, for example, 177 Noun Phrases or Verb Phrases is offered by directed connectivity evaluation related to the complexity of 178 the large-scale networks. To our knowledge, this is the first time a difference in the connectivity elicited 179 by NPs or VPs processing was identified. 180 Traditionally, the problem of understanding the neural correlates of syntax is approached by studying the 181 effects of brain lesions or with syntax-related experimental tasks adminstered during neurophysiological 182 and neuroimaging acquisitions contaminated by confounding factors such as phonology or semantics 183 (Friederici et al., 2017;Vigliocco et al., 2011). Our approach is to leverage NP/VP homophonous phrases. 184 The advantage of our solution is that we can factor out confounding factors by analyzing these 185 homophonous phrases. 186 187 The shift from the analysis of isolated lexical elements such as bare Vs and Ns vs. syntactic units, namely 188 VPs and NPs, is obviously a necessary step toward the goal of capturing syntactic information. Lexical 189 elements in isolation contain linguistic information but these pieces of information are artificially 190 expressed in single words whereas natural linguistic expressions always involve syntactic computation. 191 In Here, we decoded the acoustic stimuli exploiting 29 different speech-encoding cortical areas spanning 218 the entire brain. Only recently such strategy has been used in the decoding of groups of syllables and 219 words (Proix et al., 2022). 220 However, our approach relies on the time evolution of the connectivity values between recording 221 contacts. This solution has the advantage of assuring high inter-subject generalizability as shown by the 222 6 LOSO validation results: the connectivity features are independent of the location of the implanted leads, 223 which may differ from subject to subject. Also, our method is well suited to be implemented in an online 224 decoder. Moreover, the signals that drive the decoding are directly entangled to the syntactic 225 representation of the stimuli rather than their phonological -and articular -components. 226 We believe that a decoding strategy that relies on multiple language-encoding cortical areas will drastically 227 improve the performance of speech prostheses and may be the key missing piece for the development 228 of this technology. 229 230 We showed that VPs processing, compared to NPs processing, elicited a significantly higher number of 231 directed connections, linked together more brain structures both in the DH and in the NDH, and 232 involved the activation of a wider cortical network. VPs processing was distributed beyond temporal 233 lobes, pushing the information from sources located in the right frontal lobe and left insula, to sinks in 234 both frontal lobes, anterior cingulate regions, and right insula. This suggests a greater network small-235 worldness for NPs, with a preference for short-range connections over long range ones. 236 Most of the literature converges on a more extended cerebral involvement in verb processing than for 237 nouns ( In conclusion, these results represent an important step forward in human language comprehension, 282 contributing to the full characterization of syntactic processing. We showed a specific brain activity 283 encoding a syntactic distinction, which is faster in the DH. Since, even from a purely formal point of 284 view, syntactic processing cannot be compared with other computational systems, language-related or 285 not (Chomsky, 2014;Moro, 2014bMoro, , 2014a, it is reasonable to conclude that the network highlighted here 286 is not only specific but arguably it is uniquely dedicated to syntax. We prove that it is possible to decode 287 the syntactic structure of a phrase by looking at the connections elicited by speech processing between       All patients completed all experimental sessions. During the 24h before the experimental recording, no 327 seizure occurred, no alterations in the sleep/wake cycle were observed, and no additional 328 pharmacological treatments were applied. No language or neuropsychological deficits were found in any 329 patients. Also, no anatomical alterations were made evident by magnetic resonance. High-frequency 330 stimulation (50 Hz, 3 mA, 5 sec) through SEEG electrodes was used to assess language dominance in all 331 subjects. Two patients also underwent an fMRI study during a language task before the implantation of 332 the electrodes. 333 Thirteen patients were excluded from the analysis. Eight of them exhibited pathological SEEG contacts. 334 The others five patients showed no explored recording contacts with a task-related significant activation 335 in our previous study (Artoni et al., 2020). Full demographic data are shown in The set of stimuli is based on three characteristics of Italian. First, some definite articles are pronounced 351 exactly like some object clitic pronouns (such as [la] written as la; it can be both "the -fem.sing." or "her 352 -fem.sing."). Second, the syntax of articles and clitic pronouns is very different: articles precede nouns, 353 complements follow verbs, but object clitics are placed before the verb. Third, the Italian lexicon contains 354 several homophonous pairs of nouns and verbs, such as [ˈpɔrta] (written porta), which can either mean 355 "door" or "brings". A set of pairs of words such as [la ˈpɔrta] (written as la porta) can thus be interpreted 356 either as a noun phrase ("the door") or a verb phrase ("brings her") depending on the syntactic context 357 (homophonous phrases). For example, in PULISCE LA PORTA CON L'ACQUA (s/he cleans the door 358 with water), la porta is a Noun Phrase (NP), while in DOMANI LA PORTA A CASA (tomorrow s/he 359 brings her home), la porta is a Verb Phrase (VP). 360 To be sure to eliminate phonological and prosodical factors, the pronunciation of one homophonous 361 phrase was copied in the syntactic counterpart. No other semantic or lexical distinction differentiated the 362 two types of phrases. 363 The acoustic stimuli were recorded using a Sennheiser Microphone MH40P48, connected via a Firewire 364 400 to an Apple OSX 10.5.8 with a Motu Ultralight Mk3 sound card. The stimuli were edited and 365 mastered using Audiodesk 3.02 and Peak Pro7, respectively. Files were generated in 16 bits, with a 366 sampling frequency equal to 44.1 kHz; intensity was normalized to 0 Db and rendered in .wav format. 367 All sentences were read by the same person, an Italian native speaker, male, 53 years old. Minneapolis, Minnesota) to pre-implantation T1 weighted MR images. 375 SEEG sampling rate during the experiment was set to 1 kHz (patients 1-12) or 2 kHz (patients 13-23). 376 Recordings were carried out using a 192-channels EEG-1200 (Neurofax, Nihon Kohden). All recording 377 contacts were re-referenced to two leads in the white matter, in which electrical stimulations did not 378 produce any manifestation. 379 380

381
Each subject rested in a comfortable armchair. Stimuli were delivered using the software Presentation 382 (Neurobehavioral Systems). Phrases were delivered via audio amplifiers at the minimum volume for 383 words to be perceived with ease, according to the subject. During stimuli delivery, subjects gazed at a 27 384 inches cross on a screen. A synchronization TTL trigger spike was sent to the SEEG trigger port at the 385 beginning of the sentence. Jitter and delays were lower than 1 ms. The experiment lasted around 30 386 minutes. At the end of each task, subjects were always able to correctly answer short questions about the 387 stimuli. A camera was used to control for eye movement, silence, and any unexpected behavior from the 388 patients.

390
Data pre-processing 391 An anti-aliasing band-pass filter (0.015-500 Hz) was applied at the hardware level. Recordings acquired 392 at 2 kHz were down-sampled to 1 kHz. Artifacts and pathological interictal activity were controlled and 393 removed by clinicians and scientists by visual inspection. Recordings were annotated with the events 394 triggered by the beginning of each word in all stimulus sentences. Epochs were extracted from -1.5 s to 395 4.5 s time-locked to the beginning of each stimulus. independently for each subject. Most mini-ROIs were populated by just one channel, with the most 422 numerous ones not being populated by more than 3 recording contacts. All leads contained in a single 423 mini-ROI were spatially very close and always belonged to the same electrode. (1) where D is the total number of channels. 431 The MVAR model assumes a linear relationship between the channels in ( ) of the form: 432 Where ( ) is the time-varying MVAR coefficients matrix, ( ) is a white noise process with 433 covariance matrix and p is the model order. The ( ) matrices were derived by using a general linear 434 Kalman Filter (GLKF) (Milde et al., 2010). To estimate the model order p, the Bayesian information 435 criterion (BIC) was used (Schwarz, 1978) Mini-ROIs ( Figure 1B), active directed connections (Figure 2A), and active cortical areas ( Figure 4A) 473 were graphically represented using the BrainNet Viewer toolbox for Matlab (Xia et al., 2013 was used as the loss function to minimize during the training of the LSTM, with the weights for each 489 class inversely proportional to the length of the stimulus phase. 490 The accuracy was obtained by averaging the accuracies across all folds of the LOSO cross-validation. 491 Code implementation was based on the TensorFlow package for python (Martín Abadi et al., 2015). 492 Syntactic content decoding 493 The prediction of the content of the homophonous phrases (NP vs VP) was carried out on a trial-by-494 trial basis. Only the significant connections were selected, regardless of whether the connections were 495 significant during NPs or VPs processing. For each time point, a number of values equal to the number 496 of significant connections were thus retained, corresponding to the amplitudes of the significant 497 connections during that instant. A total of 7 features were then calculated for each time point: the 498 statistical moments up to order 4, the median, the maximum, and the range (the difference between the 499 maximum and the minimum). 500 A Support Vector Machine (Cortes & Vapnik, 1995) with a radial basis function kernel was trained for 501 each time point. The training was carried out using a nested cross-validation procedure: (i) LOSO cross-502 validation was used to split the dataset into training and test set, and (ii) for each fold of the LOSO cross-503 validation, 10 fold cross-validation was used to furtherly divide the training set into training and validation 504 set. 505 The inner validation loop was used to optimize the decoder hyperparameters and to perform feature 506 selection through the minimum redundancy maximum relevance (Radovic et al., 2017) algorithm. 507 The time-varying accuracy was obtained by averaging the accuracies across all folds of the LOSO cross-508 validation procedure. 509 For each time point, the predicted labels were compared 1000 times with 1000 shuffled versions of the 510 test set labels (NP or VP) to calculate the chance level. The procedure was repeated for each fold of the 511 LOSO cross-validation, resulting in a null distribution of 1000 x (number of fold) accuracy values. An 512 exact p-value was obtained by comparing the original accuracy with the null distribution. 513 The time-varying p-values were corrected for the multiple comparisons using a cluster-size-based 514 statistical non-parametric mapping approach (Nichols & Holmes, 2002) and deemed significant if lower 515 than α = 0.05. 516 Code implementation was based on the scikit-learn package for python (Pedregosa et al., 2011). 517 518 Quantification and statistical analysis 519 The non-normality of the data undergoing statistical testing was assessed using Shapiro-Wilk tests 520 (Shapiro & Wilk, 1965). Sizes n1 and n2 of the independent samples undergoing Mann-Whitney tests 521 (Neuhäuser, 2011) and the associated U statistics are reported in the Results Section as Un1,n2 = U. 522 Statistical significance level α was 0.05. The inter-hemispheric significant connection that arose in one 523 subject was not considered in the tests comparing connections in the DH versus connections in the 524 NDH. Tests were computed using the scipy package for Python (Virtanen et al., 2020