Informed machine learning models for advancing cardiac disease prognosis
Coronary artery disease (CAD) is the leading cause of death globally, driven by the accumulation of atherosclerotic plaques that restrict blood flow to the heart. In acute cases, plaque rupture can lead to myocardial infarction (MI), causing irreversible damage, arrhythmias, or death. Accurate prognosis and monitoring of CAD - both in acute and chronic phases - therefore critical for improving patient outcomes.
This thesis investigates how expert knowledge can be integrated into machine learning (ML) models to enhance cardiac risk prediction. We address two distinct prognostic settings:
- Acute events: Symptomatic patients with cardiovascular risk factors undergo coronary angiography. The goal is to predict whether a treated or seemingly benign lesion may cause future cardiac events.
- Chronic progression: Focuses on continuous monitoring of latent, non-directly measurable biomarkers associated with the gradual progression of cardiac damage.
For the acute setting, we introduce a multimodal convolutional neural network (CNN) framework that incorporates expert knowledge via artery-level attention masks on invasive coronary angiography (ICA) images, combined with clinical variables. These fused artery-level representations are used to predict future MI risk at the patient level. The model outperforms single-modality baselines and experienced cardiologists, demonstrating the value of combining anatomical priors with multimodal data, even with limited MI cases.
Motivated by studies linking coronary geometry and stenosis characteristics with cardiac risk, we propose a second model leveraging graph neural networks (GNNs) to capture geometric and textural features of coronary lesions. A self-attention mechanism allows the model to account for interactions among multiple lesions within a patient. This GNN-based model outperforms clinical markers and conventional ML baselines at both lesion and patient levels, and shows partial generalization to external datasets. Performance differences between studies are driven more by cohort characteristics than by modeling approach, underscoring the challenges of limited and cohort-specific annotated data.
While angiography-based models can predict future events, they offer limited insight into the intermediate period between imaging and outcome. To address this, the final part of the thesis explores continuous biomarker monitoring through physics-informed ML. We investigate two methods:
- Hybrid approach: Combines a neural surrogate of a forward simulator with a data-driven correction module to estimate a cardiac biomarker using a 1D wave propagation model. Preliminary results are promising, but the approach fails to scale to complex settings due to identifiability issues with the surrogate.
- Simulation-based inference: Directly estimates the posterior from simulated observation-parameter pairs, avoiding the need for a surrogate model. This method uses entropy-regularized optimal transport to correct for simulator mismatch via domain transfer.
In summary, this thesis demonstrates the value of integrating domain knowledge and inductive biases into ML models for cardiac prognosis. It highlights practical and methodological challenges in real-world clinical data and points toward future directions such as improving generalization across populations, enhancing prognostic interpretability, and robustly inferring latent variables- cardiac and beyond- in low-data, noisy-label settings.
EPFL_TH10653.pdf
Main Document
Not Applicable (or Unknown)
openaccess
N/A
14.16 MB
Adobe PDF
32adb2de9edf5f0efe833fefd8d50845