Graph Representation Learning in Computational Pathology

Jaume, Guillaume

doi:10.5075/epfl-thesis-9447

doctoral thesis

Graph Representation Learning in Computational Pathology

2022

Advances in scanning systems have enabled the digitization of pathology slides into Whole-Slide Images (WSIs), opening up opportunities to develop Computational Pathology (CompPath) methods for computer-aided cancer diagnosis and prognosis. CompPath has been primarily developed using models based on Convolutional Neural Networks (CNNs), building on the recent successes in Computer Vision. A series of promising approaches have been proposed for nuclei segmentation and classification, tumor detection, tumor grading, among others. However, CNN-based methods suffer from several limitations. First, it is challenging to model both fine-grained nuclei-level information and long-range inter-glandular dependencies. Second, there is a discrepancy between the pixel-based analysis of CNNs and the histological entity-centered analysis employed for pathological diagnosis, which in turn can hinder model transparency. Third, the inherent complexity of training networks on large histology images with limited annotations constrains its learning capabilities.
Instead, we propose an analytical paradigm shift, where we view and analyze histology images as a set of biological entities interacting with each other. Specifically, an image is represented as an entity-graph where nodes depict biological entities and edges encode interactions between these entities. Entity-graphs are further processed by a Graph Neural Network (GNN) model to jointly encode the entity morphological attributes and topological distribution, towards tissue phenotyping. In this thesis, we study three research directions in CompPath, namely, scalability, interpretability and explainability, and weakly-supervised learning.
First, histology images are orders of magnitude larger than natural images, where diagnostically relevant regions can represent only a fraction of the image. We propose a scalable hierarchical cell-to-tissue representation (HACT) and GNN model, HACT-Net, for learning on arbitrary large inputs. We show the capabilities of HACT-Net on our proposed BR ACS dataset, the largest cohort to date for breast tumor Regions-of-Interest subtyping.
Second, computer-aided diagnostic tools must be transparent and their decision-making process justified. By shifting the analysis from pixel- to entity-based, we make the input space interpretable for pathologists that can better relate to the model input. We further propose entity-centric graph explainers, exemplified with cell-graph model explainability, along with novel metrics to evaluate explanations based on entity-level pathological concepts.
Third, acquiring ground-truth data to train deep learning systems requires pathologists to provide specific annotations, which is time-consuming, expensive, and subject to inter- and intra-observer variability. We therefore propose WholeSIGHT, a method that reduces annotation requirements to WSI-level labels only, for joint classification and segmentation of WSIs. We show the capabilities of WholeSIGHT for Gleason pattern segmentation and grading on multi-source WSI prostate datasets. The generalization properties of WholeSIGHT are further evaluated on unseen cohorts, and compared to Bayesian variants to strengthen the estimation of the model uncertainty. Finally, we introduce HistoCartography, a novel python library designed to accelerate the development of graph analytics in CompPath.

Name

EPFL_TH9447.pdf

Type

N/a

Access type

openaccess

License Condition

Copyright

Size

90.9 MB

Format

Adobe PDF

Checksum (MD5)

e596dae3c22c6de28722851572566227