Abstract

Inferring the source of a diffusion in a large network of agents is a difficult but feasible task, if a few agents act as sensors revealing the time at which they got hit by the diffusion. One of the main limitations of current source identification algorithms is that they assume full knowledge of the contact network, which is rarely the case, especially for epidemics, where the source is called patient zero. Inspired by recent implementations of contact tracing algorithms, we propose a new framework, which we call Source Identification via Contact Tracing Framework (SICTF). In the SICTF, the source identification task starts at the time of the first hospitalization, and initially we have no knowledge about the contact network other than the identity of the first hospitalized agent. We may then explore the network by contact queries, and obtain symptom onset times by test queries in an adaptive way, i.e., both contact and test queries can depend on the outcome of previous queries. We also assume that some of the agents may be asymptomatic, and therefore cannot reveal their symptom onset time. Our goal is to find patient zero with as few contact and test queries as possible. We implement two local search algorithms for the SICTF: the LS algorithm, which has recently been proposed by Waniek et al. in a similar framework, is more data-efficient, but can fail to find the true source if many asymptomatic agents are present, whereas the LS+ algorithm is more robust to asymptomatic agents. By simulations we show that both LS and LS+ outperform previously proposed adaptive and non-adaptive source identification algorithms adapted to the SICTF, even though these baseline algorithms have full access to the contact network. Extending the theory of random exponential trees, we analytically approximate the source identification probability of the LS/ LS+ algorithms, and we show that our analytic results match the simulations. Finally, we benchmark our algorithms on the Data-driven COVID-19 Simulator (DCS) developed by Lorch et al., which is the first time source identification algorithms are tested on such a complex dataset.

Details