Abstract

Reinforcement learning is a type of supervised learning, where reward is sparse and delayed. For example in chess, a series of moves is made until a sparse reward (win, loss) is issued, which makes it impossible to evaluate the value of a single move. Still, there are powerful algorithms, which can learn from delayed and sparse feedback. In order to investigate how visual reinforcement learning is determined by the structure of the RL-problem, we designed a new paradigm, in which we presented an image and asked human observers to choose an action (pushing one out of a number of buttons). The chosen action leads to the next image until observers achieve a goal image. Different learning situations are determined by the image-action matrix, which creates a so-called environment. We first tested whether humans can utilize information learned from a simple environment to solve more complex ones. Results showed no evidence supporting this hypothesis. We then tested our paradigm on several environments with different graph theoretical features, such as regular vs. irregular environments. We found that humans performed better in environments which contain less image-action pairs to the goal. We tested various RL-algorithms and found them to perform inferior to humans.

Details