Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Efficient Learning from Comparisons
 
doctoral thesis

Efficient Learning from Comparisons

Maystre, Lucas  
2018

Humans are comparison machines: comparing and choosing an item among a set of alternatives (such as objects or concepts) is arguably one of the most natural ways for us to express our preferences and opinions. In many applications, the analysis of data consisting of comparisons enables finding valuable information. But datasets often contain inconsistent comparison outcomes, because human preferences shift and observations are tainted by noise. A principled approach to dealing with intransitive data is to posit a probabilistic model of comparisons. In this thesis, we revisit Luce's choice model, the study of which began almost a century ago, in the context of large-scale online data collection. We set out to learn a ranking over a set of items from comparisons in a computationally, statistically and data efficient way.

First, we consider the algorithmic problem of estimating model parameters from choice data, and we seek to improve upon the computational and statistical efficiency of existing methods. Our contribution is to show that it is possible to express the maximizer of the model's likelihood function as the stationary distribution of a Markov chain. This enables the use of fast linear solvers or well-studied iterative methods for Markov chains for parameter inference in Luce's model.

Second, we develop a data-efficient method for learning a ranking, by adaptively choosing pairs of items to compare, based on previous comparison outcomes. We begin by showing that Quicksort, a widely-known sorting algorithm, works well even if comparison outcomes are noisy. Under distributional assumptions on model parameters, we provide asymptotic bounds on the quality of the ranking it recovers. Building on this result, we use sorting algorithms as a basis for a simple, practical active-learning method that performs well on real-world datasets, at a small fraction of the computational cost of competing methods.

Third, we focus on structured choices in a network. In particular, we study a model where users navigate in a network (e.g., following links on the Web) and set out to estimate transition probabilities along the edges of the network from limited observations. We show that if transitions follow Luce's axiom, their probability can be inferred using only data consisting of the (marginal) traffic at each node of the network. We propose a robust inference algorithm that admits a computationally-efficient implementation. Our method scales to networks with billions of nodes and achieves good predictive performance on clickstream data.

Beyond human preferences, probabilistic models of pairwise comparisons can also be applied to sports. Consider football: two teams are compared against each other, and the better one wins. In the last part of this thesis, we look at a concrete application of pairwise comparison models and tackle the task of predicting outcomes of matches between national football teams. These teams play only a few matches every year, hence it is difficult to accurately assess their strength. Noting that national team players also compete against each other in clubs, we propose a way to overcome this challenge by taking into account outcomes of matches between clubs, of which there are plenty. We do so by embedding all matches in player space, and devise a computationally-efficient inference procedure. The resulting model predicts international tournament results more accurately than those using only national team results.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-8637
Author(s)
Maystre, Lucas  
Advisors
Grossglauser, Matthias  
Jury

Prof. Michael Christoph Gastpar (président) ; Prof. Matthias Grossglauser (directeur de thèse) ; Prof. Martin Jaggi, Prof. Devavrat Shah, Prof. Milan Vojnovic (rapporteurs)

Date Issued

2018

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2018-06-01

Thesis number

8637

Total of pages

116

Subjects

comparisons

•

choices

•

rankings

•

probabilistic models

•

statistical inference

•

algorithms

•

machine learning

•

active learning

•

networks

EPFL units
LCA4  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Available on Infoscience
May 23, 2018
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/146586
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés