Conference paper

Assembling a Network out of Ambiguous Patches

Many graph mining and network analysis problems rely on the availability of the full network over a set of nodes. But inferring a full network is sometimes non-trivial if the raw data is in the form of many small {\em patches} or subgraphs, of the true network, and if there are ambiguities in the identities of nodes or edges in these patches. This may happen because of noise or because of the nature of data; for instance, in social networks, names are typically not unique. \textit{Graph assembly} refers to the problem of reconstructing a graph from these many, possibly noisy, partial observations. Prior work suggests that graph assembly is essentially impossible in regimes of interest when the true graph is Erdos-Renyi. The purpose of the present paper is to show that a modest amount of clustering is sufficient to assemble even very large graphs. We introduce the $G(n,p;q)$ random graph model, which is the random closure over all open triangles of a $G(n,p)$ Erdos-Renyi, and show that this model exhibits higher clustering than an equivalent Erdos-Renyi. We focus on an extreme case of graph assembly: the patches are small ($1$-hop egonets) and are unlabeled. We show that in realistic regimes, graph assembly is fundamentally feasible, because we can identify, for every edge $e$, a subgraph induced by its neighbors that is unique and present in every patch containing $e$. Using this result, we build a practical algorithm that uses canonical labeling to reconstruct the original graph from noiseless patches. We also provide an achievability result for noisy patches, which are obtained by edge-sampling the original egonets.

Related material