Understanding causal relationships lies at the heart of scientific inquiry. Across disciplines, from medicine and public health to economics (Card, 1993; Angrist and Krueger, 1991), social sciences (Rosenbaum and Rubin, 1983; Imbens, 2024), epidemiology (Robins et al., 2000), and artificial intelligence (Pearl, 2009), researchers seek to go beyond correlation and uncover the underlying mechanisms that govern observed phenomena. Identification and estimation of causal effects are central to this endeavor, enabling the systematic evaluation of how interventions, treatments, or policy changes influence outcomes.
In this thesis, we start by revisiting the so-called "positivity assumption" in the problem of identifying causal effects. More specifically, this assumption requires that all accessible observational and interventional distributions have full support. In our study, we show that without such an assumption, the rules of the do-calculus and existing graphical criteria for causal effect identifiability are not sound, i.e. one can obtain an identification result when, without the positivity assumption, it is impossible to do. Further, we revisit the general identifiability problem (gID) introduced by Lee et al. (2019) and its extension to the identifiability of conditional causal effects c-gID Lee et al. (2020); Correa et al. (2021). For these problems, we propose algorithms for the identification of the target causal effect and provide new proofs of their completeness and soundness under the positivity assumption, which is missing in the original studies. A nice property of the established algorithms is that they establish a simple connection between gID/c-gID problem and the classical identifiability problem (ID) (Shpitser and Pearl, 2006b,a) through decomposing the gID/c-gID into a series of ID sub-problems. Finally, we go beyond the setting of the gID problem by eliminating any restrictions on the structure of accessible interventional distributions, which we call universal identifiability problem (uID). Surprisingly, for this problem, we demonstrate that the rules of do-calculus are no longer complete. To prove it, we provide a rigorous formalization of what it means for a causal effect to be computable via do-calculus (i.e., for do-calculus to be complete). Additionally, based on our formalization, we show that the completeness results from previous works Shpitser and Pearl (2006b,a); Huang and Valtorta (2006a); Kivva et al. (2022, 2023) are well-aligned with our formalism.
In the second part of this thesis, we consider a special case of causal effect identification under the assumption of linear SCM, and, more specifically, estimation of the causal effect of a treatment on the outcome. Several methods (such as difference-in-difference (DiD) estimator or negative outcome control) have been proposed in this setting in the literature. However, these approaches require either restrictive assumptions on the data generating model or having access to at least two proxy variables. In contrast, we propose a method to estimate the causal effect using cross moments under the presence of only one proxy variable. Alternatively, we investigate the problem of treatment effect estimation when observational data is available from multiple environments, but there is no additional information in the system, such as proxy variables. For such a case, we show that under some mild assumptions, the treatment effect is identifiable.
EPFL_TH11099.pdf
Main Document
Published version
restricted
N/A
2.19 MB
Adobe PDF
7b982edd321645b13e61fcd539ffa8b6