Abstract

In reinforcement learning, an agent makes sequential decisions to maximize reward. During learning, the actual and expected outcome are compared to tell whether a decision was good or bad. The difference between the actual outcome and expected outcome is the prediction error. The prediction error can be categorised into state prediction errors (SPE) and reward prediction errors (RPE). which can serve as a teaching signal in reinforcement learning. fMRI studies revealed the brain areas where the reward prediction error and the state prediction error are computed (Haruno & Kawato 2006; McClure et al. 2003; O’Doherty et al. 2003; D’Ardenne et al. 2008; Glascher et al. 2010). Here, by using 128-channel EEG, we show when the SPE and RPE are computed. In our study, participants saw an image on the computer screen and were asked to click one out of three or four buttons, which, depending on the choice, led to the presentation of a new image until a goal image was reached. After participants have learned the path to the goal, we swapped two images. The swapped images created a SPE, which was correlated with a significant change in the frontal N1 component in the Event-Related Potential. To estimate the RPE, we fit participants’ performance to a reinforcement learning algorithm SARSA-Lambda. A time window at 200-400ms in the ERP reflected well the magnitudes of the RPEs of this algorithm (r = 0.51, p = 0.02). Our results show that the frontal P3 component in ERP reflects the reward prediction process, while the state prediction process is reflected by the frontal N1 component, which is in line with the mismatch negativity studies(Campbell et al. 2007).

Details

Actions