Exploring Query-to-reference Mapping Challenges for Automated Single-Cell Atlas-based Diagnostics
Single-cell atlases are built by integrating multiple heterogeneous datasets into a common embedding space. The aim is reducing the dataset-specific biases or batch effects, while capturing the overall cellular composition and biological variability. One of the envisioned applications is automated diagnostics, where atlases are used as references to predict the phenotype of unseen patients. Here, we developed a diagnostic tool from a multi-disease atlas of inflammation. Moreover, we provided a benchmark of state-of-the-art integration methods for mapping and classifying unseen patients. In our tests, all the methods performed well when query batch effects are well represented in the reference, but mostly failed otherwise. Notably, linear integration approaches demonstrated superior robustness and reduced hyperparameter sensitivity compared to more powerful variational autoencoder-based methods. These findings highlight two fundamental challenges: the selection of the optimal integration method and the management of previously unobserved batch effects when classifying new query patients. As a viable solution, we designed and tested a Centralized experimental scenario where reference and query datasets are generated in the same center, demonstrating a potential pathway toward reliable atlas-based diagnostics.
45_Exploring_Query_to_referenc.pdf
Main Document
Published version
openaccess
CC BY
780.92 KB
Adobe PDF
0de9d812f7ded41a02f91bb2d54a8b4e