On the Privacy of Anonymized Networks

Pedarsani, Pedram; Grossglauser, Matthias

doi:10.1145/2020408.2020596

Pedarsani, Pedram; Grossglauser, Matthias

2011

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The proliferation of online social networks, and the concomitant accumulation of user data, give rise to hotly debated issues of privacy, security, and control. One specific challenge is the sharing or public release of anonymized data without accidentally leaking personally identifiable information (PII). Unfortunately, it is often difficult to ascertain that sophisticated statistical techniques, potentially employing additional external data sources, are unable to break anonymity. In this paper, we consider an instance of this problem, where the object of interest is the structure of a social network, i.e., a graph describing users and their links. Recent work demonstrates that anonymizing node identities may not be sufficient to keep the network private: the availability of node and link data from another domain, which is correlated with the anonymized network, has been used to re-identify the anonymized nodes. This paper is about conditions under which such a de-anonymization process is possible. We attempt to shed light on the following question: can we assume that a sufficiently sparse network is inherently anonymous, in the sense that even with unlimited computational power, deanonymization is impossible? Our approach is to introduce a random graph model for a version of the de-anonymization problem, which is parameterized by the expected node degree and a similarity parameter that controls the correlation between two graphs over the same vertex set. We find simple conditions on these parameters delineating the boundary of privacy, and show that the mean node degree need only grow slightly faster than log n with network size n for nodes to be identifiable. Our results have policy implications for sharing of anonymized network information.