Large Language Models (LLMs) have driven transformative advances across numerous tasks, excelling in text generation, summarization, and multimodal scenarios such as video description and generation. Yet, a fundamental challenge persists: enabling AI systems to reason robustly about causality. This thesis focuses on commonsense causality -- the everyday, intuitive form of causal judgment humans employ in real-world scenarios. Commonsense causality is inherently uncertain, dynamically shaped by new context and information. Human reasoners routinely update their causal beliefs based on emerging evidence, treating causation as a graded belief rather than a binary fact.
Seminal theories in psychology and philosophy, from Fritz Heider's causal attribution to Harold Kelley's "discounting" and Patricia Cheng's "causal power" model, emphasize this dynamic nature: humans typically identify a principal cause but remain sensitive to uncertain factors that may strengthen or weaken their belief.
Building on these insights, we introduce the Belief of Causation with Uncertain Factors (BoCUF) framework, which models a primary causal link alongside two categories of uncertain factors: supporters (reinforcing the causal pathway) and defeaters (diminishing or blocking it). To systematically study these factors, we construct the $\delta$-CAUSAL dataset: the first benchmark explicitly annotated with real-world cause-effect pairs, enriched with context-sensitive supporters and defeaters across 10 domains.
To quantify causal beliefs continuously, we propose CESAR (Causal Embedding Association with Attention Rating), a novel token-level scoring method that leverages attention-based embeddings to measure the intensity of causal belief, explicitly modeling the influence of supporters and defeaters. Complementing CESAR, we introduce conditional dichotomy quantification, which systematically captures the causal opposition between supporters and defeaters for the same causal content. Together, CESAR and conditional dichotomy quantification operationalize the quantitative dimensions of the BoCUF framework, offering precise tools for modeling graded causal belief and causal opposition.
Extending the framework, we introduce causal epistemic consistency as a new metric to evaluate model self-consistency. It quantifies an AI system's coherence in reasoning about uncertain factors, addressing inconsistencies where a model might classify the same factor differently across contexts. This provides a more comprehensive assessment of an LLM's reliability, especially in uncertainty-rich, real-world scenarios demanding robust causal reasoning.
In summary, this thesis makes five key contributions: (i) the BoCUF framework for modeling commonsense causality, (ii) the $\delta$-CAUSAL benchmark dataset, (iii) the CESAR method for quantifying graded causal belief, (iv) the conditional dichotomy quantification framework, and (v) the evaluation of causal epistemic consistency. Together, these contributions build a comprehensive foundation for modeling human-like causal reasoning under uncertainty, advancing AI systems toward more context-sensitive causal understanding.