Divergent sensory processing converges in frontal cortex for a planned motor response

Purposeful behavior requires planning of actions based on external information. However, neuronal mechanisms converting sensory input into a motor plan remain elusive. Here, we combined wide-field calcium imaging, multi-area single-neuron recordings and focal optogenetic inactivation to reveal the precise sequence of cortical activity transforming sensory information into motor planning in mice trained to respond to a brief whisker stimulus by licking after a delay. We found that upon learning, the sensory information, initially highly-localized, rapidly spreads to diverse motor and higher-order areas, together with transient deactivation of orofacial regions, converging during the delay period to a focalized region of the frontal cortex. The secondary whisker motor cortex (wM2) appears as a key relay of this sensorimotor transformation, showing the earliest learning-enhanced response to the whisker stimulus. Our results suggest a specific cortical circuit with wM2 acquiring a pivotal role in transforming whisker information into preparatory activity for goal-directed motor planning. Highlights Cortex-wide, task-epoch specific causal neuronal dynamics of sensorimotor learning Sensory information converges to a focal frontal region critical for delay-response Orofacial cortex acquired an inhibitory response with delayed lick learning Secondary whisker motor cortex is a key node converting whisker input to lick plan


SUMMARY
Purposeful behavior requires planning of actions based on external information.
However, neuronal mechanisms converting sensory input into a motor plan remain elusive. Here, we combined wide-field calcium imaging, multi-area single-neuron recordings and focal optogenetic inactivation to reveal the precise sequence of cortical activity transforming sensory information into motor planning in mice trained to respond to a brief whisker stimulus by licking after a delay. We found that upon learning, the sensory information, initially highly-localized, rapidly spreads to diverse motor and higher-order areas, together with transient deactivation of orofacial regions, converging during the delay period to a focalized region of the frontal cortex. The secondary whisker motor cortex (wM2) appears as a key relay of this sensorimotor transformation, showing the earliest learning-enhanced response to the whisker stimulus. Our results suggest a specific cortical circuit with wM2 acquiring a pivotal role in transforming whisker information into preparatory activity for goal-directed motor planning.

Highlights
• Cortex-wide, task-epoch specific causal neuronal dynamics of sensorimotor learning • Sensory information converges to a focal frontal region critical for delayresponse • Orofacial cortex acquired an inhibitory response with delayed lick learning • Secondary whisker motor cortex is a key node converting whisker input to lick plan

INTRODUCTION
Incoming sensory information is processed in a learning-and context-dependent manner to direct goal-directed behavior, but the underlying neural circuit mechanisms are poorly understood. To dissect this process, it would be crucial to follow the causal chain of neuronal activity across brain areas as sensory information is transformed into goal-directed motor output (de Lafuente and Romo, 2006) and to examine how the underlying sensory and motor circuits become connected through learning.
Investigations of head-restrained behaving mice offer increasingly-detailed insights into mammalian brain function through precise stimulus control and monitoring of behavior (Carandini and Churchland, 2013), along with increasingly-sophisticated methods for measuring and manipulating neuronal activity (Luo et al., 2018;Pinto et al., 2019;Steinmetz et al., 2019).
Rodents gather extensive information about their immediate environment through their array of whiskers surrounding the snout (Bosman et al., 2011;Diamond et al., 2008;Feldmeyer et al., 2013;Petersen, 2019). Tactile information from whiskers is relayed to whisker somatosensory cortex (wS1) where each whisker is represented by an anatomically-defined unit, termed a barrel, forming a well-defined starting point for cortical processing of whisker-related sensory information. Although rodents can perform simple whisker-dependent tasks after permanent lesioning of wS1 (Hong et al., 2018;Hutson and Masterton, 1986), transient inactivation of wS1 impairs performance in tasks where they are trained to detect a brief whisker deflection (Hong et al., 2018;Miyashita and Feldman, 2013;Sachidhanandam et al., 2013;Yang et al., 2016). Whisker-detection task learning appears to be accompanied by a strengthening of interactions between wS1 and secondary whisker somatosensory cortex (wS2) (Kwon et al., 2016;Yamashita and Petersen, 2016), as well as the recruitment of multiple other brain regions (Kyriakatos et al., 2017;Le Merre et al., 2018;Sippy et al., 2015).
The simplest whisker-detection tasks allow mice to lick immediately after the whisker stimulus, giving rise to neuronal activity related to sensory, choice and motor components of behavior, which are difficult to disentangle (Musall et al., 2019).
Imposing a delay between the presentation of the sensory stimulus and the motor response can help to better distinguish such mixed computations. Previous studies have reported prominent delay period activity in broad regions of frontal cortex (Chabrol et al., 2019;Chen et al., 2017;Erlich et al., 2011;Esmaeili and Diamond, 2019;Fassihi et al., 2017;Funahashi et al., 1989;Fuster and Alexander, 1971;Guo et al., 2014;Li et al., 2015;Makino et al., 2017;Tanji and Evarts, 1976). However, the various roles of distinct subregions of frontal cortex has not been studied in detail for any specific sensorimotor delay task across learning. Furthermore, the precise circuit mechanisms that give rise to such preparatory neuronal activity during the delay period in response to a sensory stimulus across learning remains to be investigated.
Moreover, recent studies suggest that uninstructed movements may contribute prominently to widespread neuronal activity (Musall et al., 2019;Stringer et al., 2019), and may be an important confound during many delayed-response tasks where preparatory movements could help animals to bridge the delay period. Thus, neuronal delay activity in some regions might reflect preparatory movement itself rather than the motor plan.
Here, we introduce a delayed whisker-detection learning paradigm and through a unified analysis by high-speed calcium imaging, large-scale neurophysiological recordings, temporally specific optogenetic inactivation and rigorous behavioral and analytical methods we detail the spatiotemporal map of causal cortical processing accompanying learning. We find causal contributions for initially localized sensory processing in wS1/wS2 rapidly spreading to multiple downstream cortical areas and converging in secondary tongue/jaw anterolateral motor cortex (tjM2/ALM), with the earliest learning-enhanced choice-related neuronal activity being found in the secondary whisker motor cortex (wM2).

Behavioral changes accompanying delayed-response task learning
To study essential neuronal mechanisms for reward-based sensorimotor transformation, we designed a well-controlled Go/No-Go learning paradigm where head-restrained mice learned to lick in response to a whisker stimulus after a one second delay period ( Figure 1A-C). To precisely track the sequence of cortical responses, we used a single, short (10 ms) deflection of C2 whisker. To uncover changes specific to the coupling of the whisker stimulus with the licking response, a two-phase learning paradigm was implemented: the "Pretraining" included only one trial type; trials were separated 6-8 seconds and started after mice withheld licking for a variable "Quiet" window of 2-3 seconds. A visual and an auditory cue, separated by 2 seconds, signaled the start and end of each trial. Only licking after the auditory cue was rewarded, while licking before the auditory cue ("early lick") aborted the trial with a time-out punishment. The "Pretraining" stablished a basis for the task, where mice learned the general task structure while no whisker stimulus was delivered yet. In the second phase, "Whisker training", in half of trials (referred to as "Go" trials), the whisker stimulus was introduced one second after the visual onset and omitted in the other half ("No-Go" trials); only licking after the auditory cue in Go trials was rewarded. Early licks in all trials and licking in No-Go trials were punished with a time-out, resulting in longer inter-trial-intervals ( Figure 1B). Thus, mice were required to withhold licking to initiate trials, detect the whisker stimulus in Go trials, remember it over a one second delay period, and lick after the auditory go cue was presented. This design allowed us to compare behavior and neuronal activity of mice before ("Novice") and after ("Expert") the whisker training phase. Both Novice and Expert mice were recorded in the final task conditions ( Figure 1C, for details see STAR Methods).
We quantitatively monitored the behavior of mice using a piezoelectric lick sensor and high-speed video filming ( Figure 1A). The whisker training induced both instructed and uninstructed behavioral changes (Figures 1D-F and S1  Figure S1B; Novice, p=0.14, Expert, p<0.01; Wilcoxon signedrank test). This indicates that Expert mice used whisker information to make correct decisions, and predicted the timing of the auditory cue while avoiding early licks in the majority of trials. After whisker training, mice also adopted new movement strategies (Figures 1E-F, S1C-D). In Hit trials, mice decreased whisker movement before whisker stimulus, possibly to improve the detection of passively-applied whisker stimuli in the "receptive mode" (Diamond and Arabzadeh, 2013;Kyriakatos et al., 2017). The whisker, tongue and jaw movements after whisker stimulus increased in Hit trials, reflecting preparation for licking. These anticipatory movements were absent in Miss and Correct-rejection trials ( Figure S1C).

Emergence of cortical activation and deactivation patterns through whisker training
The delay-task enables the investigation of different aspects of neuronal computations underlying reward-based behavior including: sensory processing, motor planning and motor execution in well-isolated time windows. As a first step, we mapped the spatiotemporal dynamics of cortical activity by wide-field calcium imaging at a high temporal resolution (100 frames per second, Figures 2 and S2). In transgenic mice expressing a fluorescent calcium indicator in pyramidal neurons (RCaMP mice) (Bethge et al., 2017), functional images of the left dorsal cortex were obtained through an intact skull preparation, and registered to the Allen Mouse Brain Common Coordinate Framework (Figure 2A-B) (Lein et al., 2007;Wang et al., 2020).
To examine the changes in cortical processing upon learning, we compared in the same mice the activity in correct trials before (Novice, 62 sessions) and after (Expert, 82 sessions) whisker training (Figures 2C-E for Hit trials and S2A-B for Correct-rejection trials, Videos S1-2). The visual cue evoked responses in the primary visual (Vis) and surrounding areas (Andermann et al., 2011;Marshel et al., 2011;Wang and Burkhalter, 2007), which decreased significantly after whisker training  2D) followed by a wide-spread gradual increase toward the auditory cue initiating in the primary and secondary motor areas for whisker (wM1, wM2) and tongue/jaw (tjM1, tjM2/ALM), as well as posterior parietal cortex (PPC) and limb/trunk areas ( Figure 2C and E To control for hemodynamic effects of the wide-field fluorescence signal, we also imaged transgenic mice expressing an activity-independent red fluorescent protein, tdTomato ( Figure S3A-B; 57 sessions from 7 Expert mice). The tdTomato control mice showed significantly smaller signals than the RCaMP mice did (subtraction between RCaMP and tdTomato mice images, threshold p<0.05, Wilcoxon rank-sum test, FDR-corrected). In visual cortex of both RCaMP and tdTomato mice, negative intrinsic signals were evoked around 1 s after the visual stimulus. However, the whisker stimulation evoked the rapid positive sensory response only in RCaMP mice, and no clear response was evoked in tdTomato mice likely because of the weak stimulation amplitude. On the other hand, some positive intrinsic optical signals were evoked in midline and frontal regions of tdTomato mice, but the amplitude of these signals was significantly smaller than for RCaMP mice (threshold p<0.05, Wilcoxon rank-sum test, FDR-corrected). These results suggest that the spatiotemporal patterns of fluorescence signals in RCaMP mice largely reflected the calcium activity of the cortex.

Distinct modification of early and late whisker processing in single neurons
To further investigate learning-related cortical dynamics at the level of spiking activity of single units and with higher temporal resolution we carried out large-scale silicon probe recordings (Buzsáki, 2004). Our recordings targeted 12 brain regions with guidance from wide-field calcium imaging (Figures 2 and S2), optical intrinsic imaging and previous literature (Esmaeili and Diamond, 2019;Guo et al., 2014;Harvey et al., 2012;Kyriakatos et al., 2017;Le Merre et al., 2018;Mayrhofer et al., 2019;Sippy et al., 2015;Sreenivasan et al., 2016): visual (Vis), whisker somatosensory (wS1 and wS2) and auditory (Aud) cortices; motor areas related to whisker (wM1 and wM2) and tongue/jaw (tjM1 and tjM2/ALM); the dorsolateral region of striatum (DLS) innervated by wS1; higher-order areas of posterior parietal cortex (PPC), medial prefrontal cortex (mPFC) and a dorsal part of hippocampal area CA1 (dCA1) (Figures 3A and S4A). In any given session we recorded from two areas simultaneously, and the precise anatomical location was determined by reconstructing three-dimensional coordinates of the silicon probe tracks through histological analysis, two-photon tomography, and registration to the Allen atlas (Figures 3A and S4A; for details, see STAR Methods) (Lein et al., 2007;Wang et al., 2020). In total, we recorded 4,415 regular spiking units (RSUs) in 22 Expert mice, and 1,604 RSUs in 8 Novice mice.
Single neurons in different areas fired spikes in specific moments of the task such as whisker sensory processing, lick preparation and lick execution ( Figure 3B).
To reveal neuronal firing changes through whisker training, we calculated timedependent mean firing rate for all recording probes ( Figure 3C and Videos S5-6) and for the 12 anatomically defined areas ( Figure 3D). The visual cue evoked a long-lasting response in Vis and PPC of Novice and Expert mice. The whisker stimulus evoked an early wide-spread excitation across whisker sensorimotor areas (wS1, wS2, wM1, wM2), as well as PPC, DLS and tjM2/ALM. Following the auditory cue, excitation rapidly covered all recorded regions.
Major changes by whisker training appeared in the delay period between the whisker and auditory stimuli. The initial excitation was significantly enhanced in wM2 and tjM2/ALM. An early excitatory response in PPC and a transient suppression of firing in tjM1 also appeared across learning (non-parametric permutation test). Firing rates of all areas in Novice mice returned to baseline levels shortly after whisker stimulation, whereas in Expert mice wS2, PPC, DLS, wM2, tjM2/ALM and tjM1 neurons showed increased activity in different parts of the delay. PPC neuronal firing remained elevated only during the first part of the delay period, returning to baseline before the auditory cue, while the activity of wM2, Striatum and tjM1 neurons ramped up towards the lick onset. On average neurons in tjM2/ALM maintained elevated firing throughout the entire delay period. These results suggest that the whisker training enhanced the initial distributed processing of whisker stimulus, and formed the memory of a licking motor plan among higher-order areas of whisker and tongue/jaw motor cortex.
We further investigated how selectively those neurons were recruited for task execution by considering other trial types. First, we found that the delay period activities were absent in Miss trials ( Figure S4B-C). Second, we quantified response selectivity of individual neurons for Hit and Correct-rejection trials based on ROC analysis ( Figure 3E; see STAR Methods), and found in most areas that a larger population of neurons became selectively recruited during the delay period comparing Expert and Novice mice. These results suggest the involvement of the acquired neuronal delay period activity in correct whisker detection and delayed-licking. Silicon probe recordings thus provide consistent and complementary information to wide-field calcium imaging.

Stable and acquired patterns of neuronal firing
The overall population activity patterns in different brain areas were thus profoundly different (Figures 2 and 3), but even neurons recorded from the same silicon probe could show striking diversity. Assuming that neurons with similar firing dynamics perform similar processing, it is informative to identify those temporal patterns and investigate whether a single pattern is confined or distributed across the brain. We clustering after dimensionality reduction with PCA and spectral embedding ( Figure   S5A-B) yielded 24 clusters of neurons, which were sorted by their onset latency ( Figure   4A) (Hastie et al., 2009).
In agreement with the observation of profound changes during the delay period after whisker training (Figures 1-3), neuronal clustering delineated several whisker responsive clusters with distinct temporal dynamics ( Figure 4A, clusters C2-C7  Figures 4D and S5D). These results suggest that the processing of whisker stimulus during the delay period was initially confined in whisker sensorimotor areas, then widely distributed, and finally converged in the frontal areas for whisker, tongue/jaw and DLS to plan licking.

Focalized delay period activity in frontal cortex
The most prominent cortical change after whisker training was the emergence of widespread delay period activity patterns at the level of population and single neurons To identify neural activities more directly related to the task execution, we leveraged trial-by-trial variability of the neuronal activity and anticipatory movements ( Figure 5).
First, we separated neural activities by selecting "Quiet" trials in which mice did not make jaw movements during the delay period ( Figures 5A-B, S6A; see STAR Methods). When only Quiet trials were considered, the wide-spread calcium activity during the delay, observed in the average of all trials, became more localized to wM2 and tjM2/ALM ( Figure 5A). This focal activation together with deactivation of orofacial sensorimotor areas emerged by learning (threshold p<0.05, Wilcoxon rank-sum test, FDR-corrected). Electrophysiology data also demonstrated a consistent localization of the neuronal delay period activity ( Figure 5B).  S6D) (Buse, 1982).
Whisker sensorimotor areas (wS1, wS2, wM1 and wM2), in both Novice and Expert mice, had the largest proportion of neurons significantly modulated by whisker stimulus in the first 100 ms (Whisker encoding neurons, Figure 5C top). Among these areas, the fraction of Whisker encoding neurons only increased across whisker training in wM2 (p=0.032, Pearson's chi-square test), while it decreased in wS2 (p=0.029) and wM1 (p=0.014). In contrast, Delay encoding neurons that were significantly modulated between 100 ms and 1 s after the whisker stimulus ( Figure 5C, middle) were mainly found in tjM2/ALM, but also in wM2, which was strikingly enhanced by whisker training (p<10 -5 ). Some neurons in wM2, tjM2/ALM, tjM1 and DLS were found to be significantly modulated during the 200 ms prior to the lick onset before and after whisker training ( Figure 5C; PreLick encoding neurons) reflecting the licking initiation signal in these areas beyond those captured by orofacial movements or sound onset predictors in the model. We then asked whether similar populations of neurons encode different task variables in each area ( Figure 5D). To address this question, we quantified the degree of overlap across populations of Whisker, Delay and PreLick encoding neurons in the key areas of interest and visualized it using Venn diagrams ( Figure 5D). We found that enhanced Delay and Prelick encoding populations were largely non-overlapping.
Finally, we asked whether our encoding model, fitted using all trials, can reproduce neuronal activity in Quiet trials ( Figure 5E). Model-reconstructed PSTHs after removing movement-related regressors confirmed that neurons in tjM2/ALM kept their firing throughout the delay period, while the firing in other areas returned to baseline, in agreement with the empirical data. This result supports the model validity and highlights the prominence of tjM2/ALM.

Routing of whisker information to frontal cortex
tjM2/ALM appears to be the most important processing node during the delay period, even persisting in Quiet trials ( Figure 5A-B) and after accounting for preparatory movements ( Figure 5C-D). However, neuronal activity diverges between Novice vs Expert and between Hit vs Miss trials at a very early phase of the delay period (Figures 2, 3, S2 and S4) suggesting this as a critical window for decision making. We therefore investigated early neuronal delay period activity by tracking the cortical pathway that signals whisker input to tjM2/ALM by following the rapid sequence of the whiskerevoked responses among anatomically connected regions ( Figure 6). Frame-by-frame analysis of high-speed calcium imaging data showed that the whisker stimulus evoked the earliest responses in wS1 at 20 ms after the stimulus onset; activity then spread to wS2, wM1, wM2, Aud, PPC, retrosplenial area and finally reached tjM2/ALM ( Figures   6A and S7A). This earliest sequence including the deactivation of tjM1/S1 was significantly enhanced by whisker training ( Figure 6A, threshold p<0.05 by Wilcoxon rank-sum test, FDR-corrected), but was diminished when mice failed to lick ( Figure   S7C-D, threshold p<0.05 by Wilcoxon rank-sum test, FDR-corrected), indicating its possible involvement in whisker-detection and delayed-licking. Neuronal firing showed a consistent spatiotemporal pattern of the whisker-evoked sequence, including a suppression of firing in tjM1 and its enhancement by whisker training ( Figure S7B).
Sorting population responses in different areas according to latency, revealed a clear sequential development of firing within 100 ms from wS1 to tjM2/ALM ( Figure 6B) which was mostly conserved between Novice and Expert. However, the single neuron latency in wM1 was delayed, whereas it was shortened in wM2 after whisker training ( Figure 6C, wM1: p=0.008, wM2: p=0.041, Wilcoxon rank-sum test, FDR-corrected).
Moreover, among all areas recorded, wM2 showed the earliest significant increase in firing upon whisker training, as well as the earliest significant difference comparing Hit and Miss trials ( Figure 6D, Novice vs Expert: p= 0.022, Hit vs Miss: p=0.025, nonparametric permutation test, FDR-corrected). Altogether, these results highlight the role of wM2 as a potential node to bridge sensory to motor activity by relaying whisker sensory information from wS1/wS2 to tjM2/ALM.

Temporally-specific causal contributions of different brain regions
Imaging and electrophysiology data suggested multiple phases of neural processing for whisker-detection, motor planning and delayed-licking. To examine the causal contribution of brain regions in each of these phases, we performed spatiotemporally-selective optogenetic inhibition ( Figure 7A). In transgenic mice expressing ChR2 in GABAergic neurons (n=9, VGAT-ChR2) (Guo et al., 2014), we applied blue light pulses to each brain region through an optical fiber. The blue light was delivered randomly in one-third of the trials, occurring in one of the four temporal windows: Baseline (from visual cue onset to 100 ms before whisker stimulus onset), Whisker (from 100 ms before to 200 ms after whisker stimulus onset), Delay (from 200 ms to 1000 ms after whisker stimulus onset), or Response (from 0 ms to 1100 ms after auditory cue onset).  Figure 5C). The differential impact of inactivating nearby cortical regions is consistent with high spatiotemporal specificity of our optogenetic manipulations. Thus, spatiotemporal mapping of causal impacts suggested that critical whisker processing was initially distributed across diverse cortical regions, and then converged in frontal regions for planning lick motor output, in agreement with neural activity data.
If a brain region is critically involved in task execution, neural activity in that area would code behavioral decision, and its inhibition would cause behavioral impairments.
To further evaluate the correlation between selective neuronal firing and its causal contribution to behavior, we defined an involvement index for each area and time window as the product of the difference in firing rate between Hit versus Correctrejection and the change in Hit rate upon optogenetic inhibition ( Figure 8A). The involvement index during the Whisker period was largest in wS2 and wS1 (wS2 and wS1 versus other areas, p<0.01, non-parametric permutation test) highlighting these areas as the main nodes of whisker sensory processing. During the Delay period, tjM2/ALM had the largest involvement index (tjM2/ALM versus other areas, p<0.001, non-parametric permutation test). The most critical area in the Response window was tjM1 (tjM1 versus other areas, p<0.05, non-parametric permutation test).

DISCUSSION
We found converging evidence for the temporally distinct involvement of diverse cortical regions in delayed sensorimotor transformation using five comprehensive and complementary technical approaches: i) a two-step learning paradigm to differentiate specific computations acquired for converting a sensory stimulus to a motor plan, ii) high-speed video filming of orofacial movements to account for movement related neuronal activity during the delay, iii) high-speed wide-field optical calcium imaging, iv) cortex-wide silicon probe single-neuron recordings, and v) task epoch-and areaspecific optogenetic inactivation. Our analyses focused on learning-induced changes in causal neural activity transforming whisker-deflection evoked sensory responses into delay period activity for motor planning, leading to a specific hypothesis for the underlying neuronal circuit mechanisms discussed below.

Localized preparatory neuronal activity in frontal cortex
In the present delayed-response task, mice learned to detect a brief whisker deflection and lick a water spout after one second. Broad regions of cortex showed elevated activity in Expert mice during parts of the delay period in Hit trials (Figures 2 and 3).
By clustering neuronal firing patterns, we found that delay-period responsive neurons (cluster C6) were found predominantly in tjM2/ALM, as well as wM2, wM1 and DLS possibly present in many similar delayed-response tasks. At the same time, we also found that large parts of the spiking activity in many regions were correlated with orofacial movements (Table S1), in agreement with previous studies which reported given that there is no task-related neuronal delay activity in tjM2/ALM of Novice mice , it is unlikely that this region is important during the delay period in Novice mice. Our study thus confirms and further extends previous studies demonstrating the critical role of ALM in motor planning in delayed-response tasks (Guo et al., 2014;Li et al., 2015).

Lick and No-Lick signals in tjM1
In Expert mice, we found that the whisker stimulus evoked a sharp deactivation broadly across orofacial sensorimotor cortex including tjM1, an area thought to be involved in and 8A) and highlighting the functional and causal distinctions between nearby frontal regions.

Initiation of persistent activity by wM2
In the present behavioral paradigm, learning the relevance of the C2 whisker was the critical step for mice. The well-controlled sensory stimulus allowed the step-by-step investigation of sensory propagation. The earliest cortical response to whisker stimulus occurred in wS1 and wS2, which changed relatively little across whisker training This early delay window after the whisker stimulus seems to be the critical period for decision making and when the whisker information is converted to preparatory neuronal activity in Hit but not Miss trials ( Figures S4 and 6). Within this period, wM2 showed the earliest significant increase in whisker-evoked firing (< 50 ms) in both Novice vs Expert and Hit vs Miss comparisons ( Figure 6). Thus, wM2 might serve as a key node in the corticocortical network to begin the process of converting a whisker sensory stimulus into longer-lasting preparatory neuronal activity. Shortly after wM2 activation, tjM2/ALM, an important premotor area for control of licking (Guo et al., 2014;Li et al., 2015;Mayrhofer et al., 2019), started to increase firing ( Figure 6).
Through cortico-cortical connectivity (Luo et al., 2019), activity in wM2 could contribute directly to exciting the neighboring region tjM2/ALM, which manifested the most prominent delay period activity through whisker training (Figures 4 and 5), consistent with previous studies Li et al., 2015).

A cortico-cortical network for learned sensorimotor planning
Our results suggest a hypothesis for a minimal cortical network connecting whisker sensory coding to preparatory neuronal activity for motor planning: a pathway wS1 -> wS2 -> wM2 -> tjM2/ALM could be the main stream of signal processing ( Figure 8B).
Some of the most prominent whisker-related changes through whisker training occurred in wM2 and tjM2/ALM, and it is possible that reward-related potentiation of synaptic transmission between wS2 -> wM2 and wM2 -> tjM2/ALM could underlie important aspects of the present learning paradigm. All of these cortical areas are likely to be connected through reciprocal excitatory long-range axonal projections, which could give rise to recurrent excitation helping to prolong firing rates of neurons in relevant brain regions during the delay period of Hit trials. Enhanced reciprocal excitatory connectivity amongst wS1 <-> wS2 <-> wM2 <-> tjM2/ALM could be mediated by Hebbian types of synaptic plasticity. Interestingly, in a related whisker detection task without a delay period, enhanced reciprocal signaling between wS1 and wS2 has already been proposed to play an important role (Kwon et al., 2016;Yamashita and Petersen, 2016). It is also important to note that a large number of subcortical structures are also likely to be involved in task performance including thalamus (El-Boustani et al., 2020;Guo et al., 2017), basal ganglia (Sippy et al., 2015) and cerebellum (Chabrol et al., 2019;Gao et al., 2018).
In summary, here, we found evidence that learning induces a highly-dynamic initial spread of sensory processing across cortical areas followed by a convergence of activity supporting a motor plan in a localized region of frontal cortex, and we propose the causal contribution of a specific cortico-cortical circuit, which we hypothesize could be strengthened across learning by reward-driven synaptic plasticity.

Declaration of Interests
The authors declare no competing interests.   See also Figure S1.  See also Figure S2-3 and Videos S1-4. Significance of selectivity was determined using non-parametric permutation tests (p<0.05). Note the increase in percentage of selective neurons in wM2 early after whisker stimulus and in tjM2/ALM throughout the delay period.
See also Figure S4 and Videos S5-6. See also Figure S5. See also Figure S6 and Table S1. See also Figure S7.  (B) Proposed cortical circuits connecting whisker somatosensory cortex to tongue/jaw anterolateral secondary motor cortex upon task learning.

LEAD CONTACT AND MATERIALS AVAILABILITY
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Carl Petersen (carl.petersen@epfl.ch). This study did not generate new unique reagents. weeks. All mice were weighed and inspected daily during behavioral training.

Experimental design
This study did not involve randomization or blinding. We did not estimate sample-size before carrying out the study. However, the sample-size in this study is comparable with those used in related studies (Allen et al., 2017;Guo et al., 2014;Harvey et al., 2012;Hattori et al., 2019;MacDowell and Buschman, 2020;Pinto et al., 2019).

Implantation of metal headpost
Mice were deeply anesthetized with isoflurane (3% with O2) and then were maintained under anesthesia using a mixture of ketamine and xylazine injected intraperitoneally (ketamine: 125 mg/kg, xylazine: 10 mg/kg). Carprofen was injected intraperitoneally (100 µl at 0.5 mg/ml) for analgesia before the start of surgery. Body temperature was kept at 37°C throughout the surgery with a heating pad. An ocular ointment (VITA-POS, Pharma Medica AG, Switzerland) was applied over the eyes to prevent them from drying. As local analgesic, a mix of lidocaine and bupivacaine was injected below the scalp before any surgical intervention. A povidone-iodine solution (Betadine, Mundipharma Medical Company, Bermuda) was used for skin disinfection. To expose the skull, a part of the scalp was removed with surgical scissors. The periosteal tissue was removed with cotton buds and a scalpel blade. After disinfection with Betadine and rinsing with Ringer solution, the skull was dried well with cotton buds. A thin layer of super glue (Loctite super glue 401, Henkel, Germany) was then applied across dorsal part of the skull and a custom-made head fixation implant was glued to the right hemisphere without a tilt and parallel to the midline. A second thin layer of the glue was applied homogeneously on the left hemisphere. After the glue was dried, the head implant was further secured with self-curing denture acrylic (Paladur, Kulzer, Germany; Ortho-Jet, LANG, USA). For electrophysiological recordings a chamber was made by building a wall with denture acrylic along the edge of the bone covering the left hemisphere. Particular care was taken to ensure that the left hemisphere of the dorsal cortex was free of denture acrylic and only covered by super glue for optical access.
This intact, transparent skull preparation was used to perform wide-field calcium imaging as well as intrinsic optical signal (IOS) imaging experiments. The animal was returned to its home cage and ibuprofen (Algifor Dolo Junior, VERFORA SA, Switzerland) was given in the drinking water for three days after surgery.

Skull preparation and craniotomies
For wide-field calcium imaging, an intact transparent skull was used as described above. For electrophysiological recordings, up to 10 small craniotomies were made over the regions of interest using a dental drill under isoflurane anesthesia (2-3% in

Behavioral paradigm
A total of 49 mice were examined in the delayed whisker detection task including 9 RCaMP, 24 wild-type or negative, 9 VGAT-ChR2 and 7 tdTomato mice. During the behavioral experiments, all whiskers were trimmed except for the C2 whiskers on both sides, and the mice were water restricted to 1 ml of water/day. Mice were trained daily with one session/day and their weight and general health status were carefully monitored using a score sheet. Both groups of mice (Expert and Novice) went through a 'pre-training' phase which consisted of trials with visual and auditory cues (without any whisker stimulus) ( Figure 1C). Mice were rewarded by licking a spout, placed on their right side, in a 1-second response window after the auditory cue onset. Trials were separated 6-8 seconds and started after a quiet period of 2-3 seconds in which mice did not lick the spout. Each trial consisted of a visual cue (200-ms green LED) and an auditory cue (200-ms 10-KHz tone of 9 dB added on top of the continuous background white noise of 80 dB). The stimuli were separated with a delay period which gradually was increased to 2 seconds. Licking before the response period (i.e. 'early lick') aborted the trial and introduced a 3-5 second timeout. After 3-6 days of pretraining, mice learned to lick the spout by detecting the auditory cue and suppressed the early licks.
The wide-field imaging and electrophysiological recordings from the 'Novice' group of mice was performed when mice finished the pre-training phase and were introduced to the whisker delay task ( Figure 1C). In this phase a whisker stimulus (10ms Gaussian pulse through a glass tube attached to a piezoelectric driver) was delivered to the right C2 whisker 1 second after the visual cue onset in half of the trials.
Importantly, the reward was available only in trials with the whisker stimulus referred to as 'Go', and time-out punishment (together with an auditory buzz tone) was given when mice licked in trials without the whisker stimulus referred to as 'No-Go' ( Figure   1B). Thus, mice were requested to use the whisker stimulus to change their lick/no-lick behavior. Since the whisker stimulus was very weak, Novice mice continued licking in most of Go and No-Go trials irrespective of the whisker stimulus and did not show any sign of learning ( Figure 1D and S1B).
The Expert mice entered a 'whisker-training' phase of 2-29 days during which a stronger whisker stimulus (larger amplitude and/or train of pulses) was introduced ( Figure S1A). As the mice learned to lick correctly, the whisker stimulus amplitude was gradually returned to a smaller amplitude, eventually matching that delivered to Novice mice. Expert mice decreased licking in No-Go trials but increased their premature early licks after the whisker stimulus as monitored by the piezoelectric lick sensor and behavioral filming ( Figure 1D and Figure S1B, see below). Behavioral hardware control and data collection were carried out using data acquisition boards (National Instruments, USA) and custom-written Matlab codes (MathWorks).

Quantification of orofacial movements
Contacts of the tongue with the reward spout were detected by a piezo-electric sensor.
Continuous movements of the left C2 whisker, tongue and jaw were filmed by a high-  (Han et al., 2018) to register brain volumes and probe locations to the Allen mouse brain atlas.
For some brains with DiI tracks, 100 µm thick serial sections were cut on a conventional vibratome. The slices were then mounted and imaged under a fluorescence microscope (Leica DM5500). MATLAB-based software (Allen CCF tools, https://github.com/cortex-lab/allenCCF) was used to register brain slices and probe locations to Allen mouse brain atlas (Shamash et al., 2018).

Wide-field imaging data
Sessions in which the difference between the Hit rate and False-alarm rate was larger than 0.1 or smaller than 0.2 were excluded from Novice and Expert sessions, respectively. In total, 62 Novice sessions and 82 Expert sessions from 7 RCaMP mice, and 57 Expert sessions from 7 tdTomato mice were used for analysis. Acquired images were down-sampled to 77x96 pixels (111 μm/pixel). For each trial, we calculated the normalized signal intensity of each pixel as ΔF/F0 = (F-F0)/F0, where F is the intensity of a pixel in each frame, and F0 is the mean intensity of that pixel during the 1 second baseline period immediately before the onset of the visual cue. In each imaging session, mean ΔF/F0 images for different trial outcomes (Hit, Miss, False-alarm and Correct-rejection trials) were calculated by averaging "All" trials in each trial type, or by averaging "Quiet" trials in which mean jaw speed during 1 second delay period after the whisker stimulus did not exceed 4 times of the mean absolute deviation of the jaw speed (angle) during the 1 second baseline period in each trial. Images from different mice were horizontally shifted according to the functionally-identified C2-barrel (RCaMP mice) (Mayrhofer et al., 2019) and the cerebellar tentorium (RCaMP and tdTomato mice), and smoothed by spatial gaussian filter (sigma = 1 pixel, 111 μm).
Those trial-averaged images in each session were used as individual samples for statistical analysis. To test statistical differences in the pixel values between different conditions, Wilcoxon's rank-sum test was performed in each pixel, and p-value was corrected for multiple comparison by false-discovery rate, FDR (Benjamini and Hochberg, 1995 6D) and Hit vs Miss ( Figure S3 and 6E) in each area was identified using nonparametric permutation tests in 50-ms bins and p-values were corrected by FDR.

Receiver Operating Characteristic (ROC) analysis
To quantify the selectivity of single units for Go vs No-Go trials we built ROC curves comparing distribution of spiking activity in bins of 100-ms including only correct trials (Hit and Correct-rejection). The area under the ROC curve was then compared to a baseline distribution (5 bins of 100-ms before visual cue onset) to examine the significance of selectivity beyond baseline fluctuations. Non-parametric permutation tests were performed and p-values were corrected by FDR and percentage of significant neurons in each area were identified (p<0.05, FDR-corrected, Figure 3E).

Clustering neuronal responses
For clustering the neuronal response patterns, RSUs from both Novice and Expert mice (1) with more than 200 spikes throughout the recording, and (2) with more than 5 trials for each trial-type (i.e. Hit, Miss, CR and FA) were included in the analysis (n=5405 out of 6019 RSUs). For each neuron and each trial type, time varying PSTHs (100 ms bin size) were computed over a 4 second window starting from 1 second before the visual cue and lasting until 1 second after the auditory cue. PSTHs from different trial types were baseline subtracted, normalized to the range of values across all bins and then concatenated resulting in an activity matrix ∈ ℝ 5405 ×160 whose row corresponds to the concatenated normalized firing rate of the neuron across different trial types ( Figure S4A). Other normalization methods such as z-scoring resulted in similar clustering outcomes. To reduce the existing redundancy between firing rate time bins, we used Principle Component Analysis (PCA), and linearly projected firing rate vectors on a low-dimensional space. We applied PCA on the centered version of (i.e. − � .) and found 14 significant components (permutation test with Bonferroni correction for controlling family wise error rate by 0.05) (Macosko et al., 2015). The weight of different components was equalized by normalizing the data resulting in unity variance for different components ( ′ ∈ ℝ 5405 ×14 ).
Next, we employed spectral embedding on the data to detect non-convex and more complex clusters (Abbe, 2017;Von Luxburg, 2007). To do so, we computed the similarity matrix ∈ ℝ 5405 ×5405 whose element at row and column measures the similarity between ′ and ′ as where is a free parameter determining how local similarity is measured in the feature space. We tuned by putting the average of similarity values equal to 0.5 (the tuned value for is 0.0987). Then, we computed the normalized Laplacian matrix as where is the identity matrix, and is the diagonal degree matrix defined as  Figure S4B). Using the fitted parameters, we assigned a cluster index ∈ {1, … ,24} to each neuron corresponding to the Gaussian distribution it belongs (with the highest probability). The output of GMM step was the vector ∈ {1, … ,24} 5405 containing the cluster indices of neurons.
To study the patterns captured by different clusters, we first quantified the proportion of neurons within each cluster belonging to either Expert or Novice mice.
To account for the differences in the total number of neurons belonging to each group (n=3960 neurons from Expert, n=1445 neurons from Novice), weighted proportions were considered. Next, for each cluster, we quantified the distribution of neurons across different brain regions in Novice and Expert mice ( Figure 4B-C, S4C). Similarly, in computing these distributions, weighted proportions were considered to correct for the difference in sample sizes. In addition, for selected clusters ( Figure 4C, right panel), 2D-maps were computed by plotting the spatial distribution of neurons (based on their reconstructed anatomical location) across dorsal cortex in bins of 50x50 µm -the value for each bin was normalized by total number of neurons within the bin. These maps were then smoothed using a 2-D Gaussian kernel (sigma=150 µm).
To characterize changes across learning of the delay task in each area, we computed separately in Novice and Expert mice, the activity pattern of the two most representative clusters (i.e. clusters with the highest number of neurons among all clusters) ( Figure 4D and S4D) by averaging the activity among neurons belonging to the pair of area and cluster.

GLM encoding model
We used Poisson regression to fit an encoding model (Generalized linear model, GLM) to predict the spiking activity of each individual neuron given behavioral data (Nelder and Wedderburn, 1972;Park et al., 2014). For each session, we concatenated all correct trials (Hit and Correct-rejection) and then split the data to perform five-fold cross-validation. In Poisson regression, one aims at predicting the spike count Y(t) in a time bin t according to the formula: i.e. assuming that the spike counts are sampled from a Poisson distribution with rate that depends on the design matrix and on the weight vector . In our case, was constructed by binning the spikes in 100 ms bins. The weights were fit by maximizing the likelihood with Ridge regularization for each fold, and then averaged across the five folds. The parameter that controls the strength of the regularization was determined separately for each neuron using evidence optimization (Cunningham et al., 2008;Park et al., 2014).
The design matrix was constructed by including three types of variables: "event" variables, associated to task-related events; "analog" variables, associated to realvalued behavioral measures from videography; and "slow" variables, which were constant during one trial but could vary over the course of one session. Event variables included the visual cue onset, the whisker stimulus onset, the auditory cue onset and the onset of the first lick. The exact time of lick onset was determined from the highspeed video using a custom algorithm. To assess the delayed effect of such taskrelated variables, each of these event-like variables was associated with a set of ten 100-ms wide and unit height boxcar basis functions, spanning in total one second after each event. The first-lick variable was associated with two additional boxcar functions covering 0.2 seconds prior to the lick onset, to capture lick-specific preparatory neuronal activity. Analog variables included in the design matrix were the whisker, tongue and jaw speed. These quantities were first extracted from the high-speed videos using custom code and then averaged in 100 ms bins. Among the slow variables, we included the trial index, i.e. a variable that at each trial took a constant value equal to � , where is the total number of trials in a session. This variable could capture shifts in a neuron baseline activity due to slow effects across the session such as changes in satiety and motivation. Finally, we included three binary variables that took value one only if the previous trial was an early lick, a Falsealarm or a Hit trial, to capture the effect of the previous trial outcome on the subsequent trial. In total, our design matrix had 50 columns, corresponding to the number of free parameters of the model.
To assess the significance of each variable in the design matrix, we fitted a new GLM model obtained by removing the variable of interest (reduced model) from the full model. If for a certain neuron the reduced model fitted the data significantly worse than the full model (p<0.05, according to a likelihood ratio test (Buse, 1982), then that neuron was considered significantly modulated by the removed variable. The reduced model was fitted independently for each fold, using the same data splitting used for the full model. In the likelihood ratio test, the test statistics are given by 2log ( � ), where and are the full and reduced model likelihood respectively. These statistics were computed for each fold and then averaged to obtain an average statistic, from which the final p-value was computed (Buse, 1982). Note that in the presence of correlations among variables, this approach is stringent in that it tends to underestimate the significance of different variables. To separately assess the effect of the onset of event-like variables from their delayed effects, we quantified their significance independently by separately removing the first two basis functions or remaining eight basis functions (Visual, Auditory and Lick). For the whisker variable, since it was very brief in time (10 ms), we removed either the first or the remaining nine bins (referred to as 'Whisker' and 'Delay' respectively in Figure 5D and 5E). To assess the significance of the modulation due to lick-preparatory neuronal activity we separately removed the two basis functions that preceded the lick onset (referred to as 'PreLick' in Figure 5D and 5E). Spatial weight maps for selected model variables ( Figure 5C, right panel) were built by first averaging the weights over the time course of the variable, i.e. by averaging over the weights of the boxcar basis functions. Next, for each neuron these weights were projected on the reconstructed anatomical location in 2D, and were then averaged across all neurons with a certain spatial bin (50x50 µm). The resulting spatial weight map was smoothed using a 2D Gaussian kernel (sigma=150 µm). All the GLM analysis was performed in MATLAB using a combination of existing and custom-written code.

Assessing optogenetic inactivation impact
To quantify the impact of optogenetic inactivation we compared mouse averaged performance (n=9; Hit rate, False alarm rate and Early lick rate) for different light windows (i.e. Baseline, Whisker, Delay, Response) to light-off control trials. P-values were corrected for multiple comparison (i.e. 4 windows) using Bonferroni correction.

Quantifying involvement index
The involvement index was defined by combining the neuronal correlates and behavioral impact of optogenetic inhibition. For each pair of area and temporal window of interest, we built two distributions of bootstrap estimation of the mean, separately for neuronal correlates and inhibition impact, by bootstrapping 1000 times. The neuronal correlates were quantified as the mean firing rate difference in Hit vs Correct rejection trials across all neurons recorded from 25 Expert mice. The inhibition impact was quantified as the mean change in Hit rate across 9 VGAT-ChR2 mice. The distribution of involvement index was calculated as the product distribution of the two bootstrap distributions.

Statistics
Data are represented as mean ± SEM unless otherwise noted. The Wilcoxon signed rank test was used to assess significance in paired comparisons; and the Wilcoxon rank sum test was used for unpaired comparisons (Matlab implementations). Analysis of spiking activity and involvement index was performed using a non-parametric permutation test. The statistical tests used and n numbers are reported explicitly in the main text or figure legends.

Data and code availability
The complete data set and Matlab analysis code will be made freely available at the open access CERN Zenodo database https://zenodo.org/communities/petersen-labdata.