Publication:

The inductive bias of deep learning: Connecting weights and functions

cris.lastimport.scopus

2024-08-07T10:46:53Z

cris.lastimport.wos

2024-07-29T05:34:16Z

cris.legacyId

306613

cris.virtual.author-scopus

7005865062

cris.virtual.department

LTS4

cris.virtual.parent-organization

IEM

cris.virtual.parent-organization

STI

cris.virtual.parent-organization

EPFL

cris.virtual.parent-organization

STI

cris.virtual.parent-organization

EPFL

cris.virtual.parent-organization

EPFL

cris.virtual.parent-organization

EDOC

cris.virtual.parent-organization

ETU

cris.virtual.parent-organization

EPFL

cris.virtual.sciperId

101475

cris.virtual.sciperId

299435

cris.virtual.unitId

10851

cris.virtual.unitManager

Frossard, Pascal

cris.virtualsource.author-scopus

e8dafcb2-ee80-46b8-9351-5242e0ee0245

cris.virtualsource.author-scopus

1b724cc5-d9f9-45a3-9db3-c5da4473ee89

cris.virtualsource.department

e8dafcb2-ee80-46b8-9351-5242e0ee0245

cris.virtualsource.department

1b724cc5-d9f9-45a3-9db3-c5da4473ee89

cris.virtualsource.orcid

e8dafcb2-ee80-46b8-9351-5242e0ee0245

cris.virtualsource.orcid

1b724cc5-d9f9-45a3-9db3-c5da4473ee89

cris.virtualsource.parent-organization

fed12497-58d6-4287-ab16-dcfef0a03016

cris.virtualsource.parent-organization

fed12497-58d6-4287-ab16-dcfef0a03016

cris.virtualsource.parent-organization

fed12497-58d6-4287-ab16-dcfef0a03016

cris.virtualsource.parent-organization

fed12497-58d6-4287-ab16-dcfef0a03016

cris.virtualsource.parent-organization

b90b43a5-ca1d-4299-9a3f-53ece71373f9

cris.virtualsource.parent-organization

b90b43a5-ca1d-4299-9a3f-53ece71373f9

cris.virtualsource.parent-organization

b90b43a5-ca1d-4299-9a3f-53ece71373f9

cris.virtualsource.parent-organization

e241245b-0e63-4d9e-806e-b766e62006ef

cris.virtualsource.parent-organization

e241245b-0e63-4d9e-806e-b766e62006ef

cris.virtualsource.parent-organization

5520d273-fb5f-458c-93a9-f6a9eee8961b

cris.virtualsource.parent-organization

5520d273-fb5f-458c-93a9-f6a9eee8961b

cris.virtualsource.parent-organization

5520d273-fb5f-458c-93a9-f6a9eee8961b

cris.virtualsource.parent-organization

5520d273-fb5f-458c-93a9-f6a9eee8961b

cris.virtualsource.rid

e8dafcb2-ee80-46b8-9351-5242e0ee0245

cris.virtualsource.rid

1b724cc5-d9f9-45a3-9db3-c5da4473ee89

cris.virtualsource.sciperId

e8dafcb2-ee80-46b8-9351-5242e0ee0245

cris.virtualsource.sciperId

1b724cc5-d9f9-45a3-9db3-c5da4473ee89

cris.virtualsource.unitId

fed12497-58d6-4287-ab16-dcfef0a03016

cris.virtualsource.unitManager

fed12497-58d6-4287-ab16-dcfef0a03016

datacite.rights

openaccess

dc.contributor.advisor

Frossard, Pascal

dc.contributor.author

Ortiz Jimenez, Guillermo

dc.date.accepted

2023

dc.date.accessioned

2023-11-22T08:48:21

dc.date.available

2023-11-22T08:48:21

dc.date.created

2023-11-22

dc.date.issued

2023

dc.date.modified

2025-05-28T07:52:10.016301Z

dc.description.abstract

Years of a fierce competition have naturally selected the fittest deep learning algorithms. Yet, although these models work well in practice, we still lack a proper characterization of why they do so. This poses serious questions about the robustness, trust, and fairness of modern AI systems. This thesis aims to contribute to bridge this gap by advancing the empirical and theoretical understanding of deep learning, with a specific emphasis on understanding the intricate relationship between weight space and function space and how this shapes the inductive bias.

Our investigation starts with the simplest possible learning scenario: learning linearly separable hypotheses. Despite its simplicity, our analysis reveals that most networks have a nuanced inductive bias on these tasks that depends on the direction of separability. Specifically, we show that this bias can be encapsulated in an ordered sequence of vectors, the neural anisotropy directions (NADs), which encode the preference of the network to separate the training data in a given direction. The NADs can be obtained by randomly sampling the weight space. This not only establishes a strong connection between the functional landscape and the directional bias of each architecture but also offers a new lens for examining inductive biases in deep learning.

We then turn our attention to modelling the inductive bias towards a more generalized set of hypotheses. To do so, we explore the applicability of the neural tangent kernel (NTK) as an analytical tool to approximate the functional landscape. Our research shows that NTK approximations can indeed gauge the relative learning complexities across numerous tasks, even when they cannot predict absolute network performance. This approximation works best when the learned weights lie close to the initialization. This provides a nuanced understanding of the NTK's ability in capturing inductive bias, laying the groundwork for its application in our subsequent investigations.

The thesis then explores two critical issues in the deep learning research. First, we scrutinize implicit neural representations (INRs) and their ability to encode rich multimedia signals. Drawing inspirations on harmonic analysis and our earlier findings, we show that the NTK's eigenfunctions act as dictionary atoms whose inner product with the target signal determines the final reconstruction performance. INRs, which use sinusoidal embeddings to encode the input, can modulate the NTK so that its eigenfunctions constitute a meaningful basis. This insight has the potential to accelerate the development of principled algorithms in INRs, offering new avenues for architectural improvements and design.

Second, we offer an extensive study of the conditions required for direct model editing in the weight space. Our analysis introduces the concept of weight disentanglement as the crucial factor enabling task-specific adjustments via task arithmetic. This property emerges during pre-training and is evident when distinct weight space directions govern separate, localized input regions of the function space. Significantly, we find that linearizing models by fine-tuning them in their tangent space enhances weight disentanglement, leading to performance improvements across edition benchmarks and models.

In summary, our work unveils fresh insights into the fundamental links between weight space and function space, proposing a general framework for approximating inductive.

dc.description.sponsorship

LTS4

dc.identifier.doi

10.5075/epfl-thesis-9898

dc.identifier.uri

https://infoscience.epfl.ch/handle/20.500.14299/202342

dc.language.iso

en

dc.publisher

EPFL

dc.publisher.place

Lausanne

dc.relation

https://infoscience.epfl.ch/record/306613/files/EPFL_TH9898.pdf

dc.size

188

dc.subject

Deep learning science

dc.subject

inductive bias

dc.subject

generalization

dc.subject

neural anisotropy directions

dc.subject

neural tangent kernel

dc.subject

implicit neural representations

dc.subject

model edition

dc.subject

task arithmetic

dc.subject

weight interpolations.

dc.title

The inductive bias of deep learning: Connecting weights and functions

dc.type

thesis::doctoral thesis

dspace.entity.type

Publication

dspace.file.type

n/a

dspace.legacy.oai-identifier

oai:infoscience.epfl.ch:306613

epfl.legacy.itemtype

Theses

epfl.legacy.submissionform

THESIS

epfl.oai.currentset

fulltext

epfl.oai.currentset

DOI

epfl.oai.currentset

STI

epfl.oai.currentset

thesis

epfl.oai.currentset

thesis-bn

epfl.oai.currentset

OpenAIREv4

epfl.publication.version

http://purl.org/coar/version/c_970fb48d4fbd8a85

epfl.thesis.doctoralSchool

EDEE

epfl.thesis.faculty

STI

epfl.thesis.institute

IEL

epfl.thesis.jury

Prof. Alexandre Massoud Alahi (président) ; Prof. Pascal Frossard (directeur de thèse) ; Prof. Matthieu Wyart, Prof. Fanny Yang, Dr Wieland Brendel (rapporteurs)

epfl.thesis.number

9898

epfl.thesis.originalUnit

LTS4

epfl.thesis.publicDefenseYear

2023-12-05

epfl.writtenAt

EPFL

oaire.licenseCondition

copyright

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EPFL_TH9898.pdf
Size:
14.93 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections