Growing a Graph Matching from a Handful of Seeds
In many graph–mining problems, two networks from different domains have to be matched. In the absence of reliable node attributes, graph matching has to rely on only the link structures of the two networks, which amounts to a generalization of the classic graph isomorphism problem. Graph matching has applications in social–network reconciliation and de-anonymization, protein–network alignment in biology, and computer vision. The most scalable graph–matching approaches use ideas from percolation theory, where a matched node pair “infects” neighbouring pairs as additional potential matches. This class of matching algorithm requires an initial seed set of known matches to start the percolation. The size and correctness of the matching is very sensitive to the size of the seed set. In this paper, we give a new graph–matching algorithm that can operate with a much smaller seed set than previous approaches, with only a small increase in matching errors. We characterize a phase transition in matching performance as a function of the seed set size, using a random bigraph model and ideas from bootstrap percolation theory. We also show the excellent performance in matching several real large-scale social networks, using only a handful of seeds.