Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Alignment and Assembly : Inferring Networks from Noisy Observations
 
doctoral thesis

Alignment and Assembly : Inferring Networks from Noisy Observations

Yartseva, Lyudmila  
2017

Over recent years, many large network datasets become available, giving rise to novel and valuable applications of data mining and machine learning techniques. These datasets include social networks, the structure of the Internet, and protein-interaction networks, to name just a few. Graph mining exploits information hidden in these data to shed light on such problems as finding relevant pages on the web, or identifying communities of strongly connected individuals. Clearly, to address such problems, we first need the complete and reliable network graph. In many real-world scenarios, the full graph is not available for free. For example, data-collection processes may be noisy and unreliable or node identifiers may be hidden for privacy protection. Therefore, we cannot rely on the node labels to infer the full graph. In this thesis, we address fundamental and practical questions of inferring a true full network from multiple ambiguous observations. We formulate two variations of this problem: network alignment and network assembly. In each variant, we address two types of questions: first, we characterize how graph features impact the fundamental feasibility of reconstruction; second, we seek efficient algorithms that can scale to very large networks. In the first part of this thesis, we consider network alignment. We assume two large, noisy observations of the true network that are not labeled. Network alignment refers to the problem of aligning the vertices of the two networks using only structural cues and it can be viewed as a generalization of the classic graph-isomorphism problem. We make the following contributions. First, we introduce a random bigraph model with parameters p, t and s that generates two correlated graphs. We characterize conditions on p, t and s for the feasibility of alignment of two graphs. Second, we create an algorithm named percolation graph-matching (PGM) that builds an alignment from a small set of pre-matched nodes S. We prove conditions on the parameters p, t , s and r for which PGM succeeds, and we establish a phase transition in |S|. In the second part of this thesis, we consider network assembly. We assume many small, noisy observations of the true network, called patches. The node labels are either absent or not unique. The network assembly problem consists in reconstructing the true graph from these patches. We make the following contributions. First, we introduce a novel random-graph model with parameters p and q that generates a network with high clustering. We characterize conditions on p and q for feasibility of assembly. Second, we propose a heuristic assembly algorithm to reconstruct the true graph from arbitrary patches with label ambiguity.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH7562.pdf

Access type

openaccess

Size

2.61 MB

Format

Adobe PDF

Checksum (MD5)

66113b9e4cb0717325839e5bc927d6e8

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés