Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Representing graphs through data with learning and optimal transport
 
doctoral thesis

Representing graphs through data with learning and optimal transport

Petric Maretic, Hermina  
2021

Graphs offer a simple yet meaningful representation of relationships between data. This representation is often used in machine learning algorithms in order to incorporate structural or geometric information about data. However, it can also be used in an inverted fashion: instead of modelling data through graphs, we model graphs through data distributions. In this thesis, we explore several applications of this new modelling framework.

Starting with the graph learning problem, we exploit the probabilistic model of data given through graphs to propose a multi-graph learning method for structured data mixtures. We explore various relations that data can have with the underlying graphs through the notion of graph filters. We propose an algorithm to jointly cluster a set of data and learn a graph for each of the clusters. Experiments demonstrate promising performance in data clustering and multiple graph inference, and show desirable properties in terms of interpretability and proper handling of high dimensionality on synthetic and real data. The model has further been applied to fMRI data, where the method is used to successfully identify different functional brain networks and their activation times.

This probabilistic model of data defined through graphs can be very meaningful even when no data is available. Thus, in the second part of this thesis, we use such models to represent each graph through the probabilistic distribution of data, which varies smoothly on the graph. Optimal transport allows for a comparison of two such distributions, which in turn gives a structurally meaningful measure for graph comparison. We follow by using this distance to formulate a new graph alignment problem based on the optimal transport framework, and propose an efficient stochastic algorithm based on Bayesian exploration to accommodate for the nonconvexity of the graph alignment problem. We demonstrate the performance of our novel framework on different tasks like graph alignment, graph classification and graph signal prediction, and we show that our method leads to significant improvement with respect to the state-of-art algorithms.

Furthermore, we cast a new formulation for the one-to-many graph alignment problem, allowing for comparison of graphs of different sizes. The resulting alignment problem is solved with stochastic gradient descent, where a novel Dykstra operator ensures that the solution is a one-to-many (soft) assignment matrix. Experiments on graph alignment and graph classification problems show that our method for one-to-many alignment leads to meaningful improvements with respect to the state-of-the-art algorithms for each of these tasks.

Finally, we explore a family of probabilistic distributions for data based on graph filters. Distances defined through a graph filter give a high level of flexibility in choosing which graph properties we want to emphasize. In addition, in order to make the above graph alignment problem more scalable, we formulate an approximation to our filter Wasserstein graph distance that allows for the exploitation of faster algorithms, without grossly sacrificing the performance. We propose two algorithms, a simple one based on mirror gradient descent and another one built on its stochastic version, which offers a trade-off between speed and accuracy. Our experiments show the performance benefits of our novel stochastic algorithm, as well as the strong value of flexibility offered by filter-based distances.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH7294.pdf

Type

N/a

Access type

openaccess

License Condition

Copyright

Size

14.44 MB

Format

Adobe PDF

Checksum (MD5)

626075593a3bdf6238381f3d0258ee0b

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés