The Learnability of the Grammar of Jazz: 
Bayesian Inference of Hierarchical Structures in Harmony

Harasim, Daniel

doi:10.5075/epfl-thesis-10404

doctoral thesis

The Learnability of the Grammar of Jazz: Bayesian Inference of Hierarchical Structures in Harmony

2020

Musical grammar describes a set of principles that are used to understand and interpret the structure of a piece according to a musical style. The main topic of this study is grammar induction for harmony --- the process of learning structural principles from the observation of chord sequences. The question how grammars are learnable by induction from sequential data is an instance of the more general question how abstract knowledge is inducible from the observation of data --- a central question of cognitive science. Under the assumption that human learning approximately follows the principles of rational reasoning, Bayesian models of cognition can be used to simulate learning processes. This study investigates what prior knowledge makes it possible to learn musical grammar inductively from Jazz chord sequences using Bayesian models and computational simulations.

The theoretical part of the thesis presents how questions about learnability can be studied in a unified framework involving music analysis, cognitive modeling, Bayesian statistics, and computational simulations. A new grammar formalism, called Probabilistic Abstract Context-Free Grammar (PACFG), is proposed that allows for flexible probability models which facilitate the grammar-induction experiments of this study. PACFG can jointly model multiple musical dimensions such as harmony and rhythm, and can use coordinate ascent variational inference for grammar learning.

The empirical part of the thesis reports supervised and unsupervised grammar-learning experiments. To train and evaluate grammar models, a ground-truth dataset of hierarchical analyses of complete Jazz standards, called the Jazz Harmony Treebank (JHT), was created. The supervised grammar-learning experiments, in which grammars for Jazz harmony are learned from the JHT analyses, show that jointly modeling harmony and rhythm significantly improves the grammar models' prediction of the ground truth. The performance and robustness of the grammars are further improved by a transpositionally invariant parameterization of rule probabilities. Following the supervised grammar learning, unsupervised grammar learning was performed by inducing harmony grammars merely from Jazz chord sequences, without the observation of the JHT trees. The results show that the best induced grammar performs similarly well as the best supervised grammar. In particular, the goal-directedness of functional harmony does not need to be assumed a priori, but can be learned without usage of music-specific prior knowledge.

The findings of this thesis show that general prior knowledge enables an ideal learner to acquire abstract musical principles by statistical learning. In conclusion, it is plausible that much aspects of musical grammar have been learned by Jazz musicians and listeners, instead of being innate predispositions or explicitly taught concepts.

This thesis is moreover embedded into the context of empirical music research and digital humanities. Current studies either describe complex musical structures qualitatively or investigate simpler aspects quantitatively. The computational models developed in this thesis demonstrate that deep insights into music and statistical analyses are not mutually exclusive. They enable a new kind of data-driven music theory and musicology, for instance through comparative analyses of musical grammar for different styles such as Jazz, Rock, and Western classical music.

Name

EPFL_TH10404.pdf

Type

N/a

Access type

openaccess

License Condition

Copyright

Size

3.69 MB

Format

Adobe PDF

Checksum (MD5)

6351392578a8a7937cfc5c98e0fa9126