Crowding and the Architecture of the Visual System

Doerig, Adrien Christophe

doi:10.5075/epfl-thesis-7582

doctoral thesis

Crowding and the Architecture of the Visual System

2020

Classically, vision is seen as a cascade of local, feedforward computations. This framework has been tremendously successful, inspiring a wide range of ground-breaking findings in neuroscience and computer vision. Recently, feedforward Convolutional Neural Networks (ffCNNs), a kind of deep neural network inspired by this classic framework, have revolutionized computer vision and been adopted as tools in neuroscience. However, despite these successes, there is much more to vision. First, there are flagrant architectural differences between biological systems and the classic framework. For example, recurrence is abundant in the brain but absent from the classic framework and ffCNNs. Although there is widespread agreement about the importance of these recurrent connections, their computational role is still poorly understood. Second, these architectural differences lead to behavioural differences too, highlighted by psychophysical evidence. Relatedly, ffCNNs are extremely vulnerable to small changes to their inputs and do not generalize well beyond the dataset used to train them. Human vision, in contrast, is much more robust. New insights are needed to face up to these challenges.
In this thesis, I use visual crowding and related psychophysical effects as probes into visual processes that go beyond the classic framework. In crowding, perception of a target deteriorates in clutter. I focus on global aspects of crowding, in which perception of a small target is strongly modulated by the global configuration of elements across the visual field. I show that models based on the classic framework, including ffCNNs, cannot explain these effects for principled reasons and identify recurrent grouping and segmentation as a key missing ingredient. Then, I show that capsule networks, a recent kind of deep learning architecture combining the power of ffCNNs with recurrent grouping and segmentation, naturally explain these effects. I provide psychophysical evidence that humans indeed use a similar recurrent grouping and segmentation strategy in global crowding effects.
In crowding, visual elements interfere across space. To study how elements interfere over time, I use the Sequential Metacontrast psychophysical paradigm, in which perception of visual elements depends on elements presented hundreds of milliseconds later. I psychophysically characterize the temporal structure of this interference and propose a simple computational model. My results support the idea that perception is a discrete process. I lay out theoretical implications of these findings. Together, the results presented here provide stepping-stones towards a fuller understanding of the visual system by suggesting architectural changes needed for more human-like neural computations.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/165129

Name

EPFL_TH7582.pdf

Access type

openaccess

Size

10.52 MB

Format

Adobe PDF

Checksum (MD5)

9ebe784e5cf959fa690a187d73a988ad