The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning

Favero, Alessandro

doi:10.5075/epfl-thesis-11462

doctoral thesis

The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning

2025

Deep neural networks have achieved remarkable success, yet our understanding of how they learn remains limited. These models can learn high-dimensional tasks, which is generally statistically intractable due to the curse of dimensionality. This apparent paradox suggests that learnable data must have an underlying latent structure. What is the nature of this structure? How do neural networks encode and exploit it, and how does it quantitatively impact performance - for instance, how does generalization improve with the number of training examples? This thesis addresses these questions by studying the roles of locality and compositionality in data, tasks, and deep learning representations.

We begin by analyzing convolutional neural networks in the limit of infinite width, where the learning dynamics simplifies and becomes analytically tractable. Using tools from statistical physics and learning theory, we characterize their generalization abilities and show that they can overcome the curse of dimensionality if the target function is local by adapting to its spatial scale.

We then turn to more complex structures in which features are composed hierarchically, with elements at larger scales built from sub-features at smaller ones. We model such data using simple probabilistic context-free grammars - tree-like graphical models used to describe data such as language and images. Within this framework, we study how diffusion-based generative models compose new data by assembling features learned from examples. This theory of composition predicts a phase transition in the generative process, which we confirm empirically in both image and language modalities, providing support for the compositional structure of natural data. We further demonstrate that the sample complexity for learning these grammars scales polynomially with data dimension, providing a mechanism by which diffusion models avoid the curse of dimensionality by learning to hierarchically compose new data. These results offer a theoretical grounding for how generative models learn to generalize, and ultimately, become creative.

Finally, we shift our analysis from the structure of data in the input space to the structure of tasks in the model's parameter space. Here, we investigate a novel form of compositionality, where tasks and skills themselves can be composed. In particular, we empirically demonstrate that distinct directions in the weight space of large pre-trained models are associated with localized, semantic task-specific areas in function space, and how this modular structure enables task arithmetic and model editing at scale.

Name

EPFL_TH11462.pdf

Type

Main Document

Version

Not Applicable (or Unknown)

Access type

openaccess

License Condition

N/A

Size

27.2 MB

Format

Adobe PDF

Checksum (MD5)

de5af910353b5fe38bcc6ae3ad1b7c0d