This thesis presents a comprehensive theory of information-theoretic measures for characterizing the complexity and higher-order dependence structures of graphs. We introduce multivariate complexity and dependence measures for graphs within the framework of multivariate graph limits and develop consistent estimators for these measures by determining their large-sample properties. Additionally, we explore causal relationships in graphs through the application of these measures. These approaches represent the first attempt in the literature to explore multivariate dependencies and causal discovery in graphs.
We begin by addressing the problem of quantifying the complexity of the generating mechanism of a single exchangeable graph with a smooth graphon representative. By formulating a consistent estimator of graphon entropy, we capture properties inherent in the graph-generating mechanism, rather than focusing on specific graph features. We also provide customized graphon entropy estimators for widely studied random graph models such as Erdos-Rényi, Chung-Lu, and the stochastic block model, establishing a Central Limit Theorem for the first two and determining the convergence rate for the third.
Next, we introduce multivariate information-theoretic measures for multiple graphs sharing the same node set, which can be modeled as a multiplex graph. Starting with two graphs, we introduce the graphon mutual information to assess the dependence between two graph-generating mechanisms. We expand this framework to include multivariate measures such as graphon total correlation, dual total correlation, interaction information, and O-information, along with their conditional variants, to assess various dependence structures among multiple graphs. These measures offer insights into the collective dependencies and higher-order interactions when analyzing multiple graphs simultaneously, quantifying concepts of synergy and redundancy within multiple graphs.
Subsequently, we provide consistent estimators for these measures alongside convergence rates. Through simulation studies, we demonstrate the performance of these estimators and provide examples of redundant and synergistic behaviors in graph collections. Real-world examples show the potential of our estimators in uncovering shared information and complex dependence structures among graphs, enabling assessments of independence and conditional independence across entire graph structures.
Finally, we take initial steps toward developing a theory of causal discovery for graphs by conceptualizing directed acyclic graphs (DAGs) of graphs, where each node represents an individual graph. We apply the introduced measures to differentiate between Markov equivalence classes of path DAGs with three nodes and explore the interactions between synergy, redundancy, and causal structures. Experiments on simulated graphs illustrate the potential of our methods for learning causal relationships, paving the way for future research in causal discovery for graph-valued data.
Prof. Anthony Christopher Davison (président) ; Prof. Sofia Charlotta Olhede (directeur de thèse) ; Prof. Kathryn Hess Bellwald, Prof. Peter Orbanz, Prof. Ian Dryden (rapporteurs)
2024
Lausanne
2024-09-27
10713
158