Abstract

In chess, a series of moves is made until a delayed sparse feedback (win, loss) is issued, which makes it impossible to evaluate the value of a single move. There are powerful reinforcement learning (RL) algorithms, which can cope with these sequential decision making situations. A crucial component in these algorithms is the reward prediction error (RPE), which measures the difference between the actual reward and the predicted reward. Here, we show that the RPE is well reflected in a frontal negativity of the EEG. Participants saw an image on the computer screen and were asked to click one out of, for example, four buttons, which, depending on the choice, led to the presentation of a new image until a goal state was reached. 128-channel EEG was recorded. To estimate the RPE, we fit participants’ performance to state of the art RL algorithms. We chose the best fitting algorithm. Two time windows (170-286ms and 400-580ms) in the event-related potential (ERP) reflected well the magnitudes of the RPEs of this algorithm. The late time window ERP magnitude was highly correlated with the RPEs (r2=0.14, p<0.001). Compared with previous studies, this negativity occurred 200ms later, likely because of the more complicated sequential decision making paradigm.

Details

Actions