Alignment and Assembly: Inferring Networks from Noisy Observations

Yartseva, Lyudmila

doi:10.5075/epfl-thesis-7562

Yartseva, Lyudmila

2017

Télécharger

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Résumé

Over recent years, many large network datasets become available, giving rise to novel and valuable applications of data mining and machine learning techniques. These datasets include social networks, the structure of the Internet, and protein-interaction networks, to name just a few. Graph mining exploits information hidden in these data to shed light on such problems as finding relevant pages on the web, or identifying communities of strongly connected individuals. Clearly, to address such problems, we first need the complete and reliable network graph. In many real-world scenarios, the full graph is not available for free. For example, data-collection processes may be noisy and unreliable or node identifiers may be hidden for privacy protection. Therefore, we cannot rely on the node labels to infer the full graph. In this thesis, we address fundamental and practical questions of inferring a true full network from multiple ambiguous observations. We formulate two variations of this problem: network alignment and network assembly. In each variant, we address two types of questions: first, we characterize how graph features impact the fundamental feasibility of reconstruction; second, we seek efficient algorithms that can scale to very large networks. In the first part of this thesis, we consider network alignment. We assume two large, noisy observations of the true network that are not labeled. Network alignment refers to the problem of aligning the vertices of the two networks using only structural cues and it can be viewed as a generalization of the classic graph-isomorphism problem. We make the following contributions. First, we introduce a random bigraph model with parameters p, t and s that generates two correlated graphs. We characterize conditions on p, t and s for the feasibility of alignment of two graphs. Second, we create an algorithm named percolation graph-matching (PGM) that builds an alignment from a small set of pre-matched nodes S. We prove conditions on the parameters p, t , s and r for which PGM succeeds, and we establish a phase transition in |S|. In the second part of this thesis, we consider network assembly. We assume many small, noisy observations of the true network, called patches. The node labels are either absent or not unique. The network assembly problem consists in reconstructing the true graph from these patches. We make the following contributions. First, we introduce a novel random-graph model with parameters p and q that generates a network with high clustering. We characterize conditions on p and q for feasibility of assembly. Second, we propose a heuristic assembly algorithm to reconstruct the true graph from arbitrary patches with label ambiguity.

Détails

Titre Alignment and Assembly: Inferring Networks from Noisy Observations

Auteur(s) Yartseva, Lyudmila

Directeur(s)

Grossglauser, Matthias
Shokrollahi, Mohammad Amin

Pagination 122

Date 2017

Editeur Lausanne, EPFL

Mots-clés (libres)

network analysis; graph mining; complex networks; network reconstruction; graph alignment; graph assembly; random graphs; graph isomorphism; graph matching

Langue Anglais

DOI https://doi.org/10.5075/epfl-thesis-7562

Autres identifiant(s) urn: urn:nbn:ch:bel-epfl-thesis7562-1

Laboratoires INDY1

Le document apparaît dans Production scientifique et compétences > I&C - Faculté Informatique & Communications > IINFCOM > INDY1 - Laboratoire de la Dynamique de l'lnformation et des Réseaux 1
Production scientifique et compétences > Thèses EPFL
Travail produit à l'EPFL
Publié
Thèses

Date de création de la notice 2017-03-28

Files

Résumé

Détails

PDF