Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Discrete-Choice Mining of Social Processes
 
doctoral thesis

Discrete-Choice Mining of Social Processes

Kristof, Victor  
2021

Poor decisions and selfish behaviors give rise to seemingly intractable global problems, such as the lack of transparency in democratic processes, the spread of conspiracy theories, and the rise in greenhouse gas emissions. However, people are more predictable than we think, and with machine-learning algorithms and sufficiently large datasets, we can design accurate models of human behavior in a variety of settings. In this thesis, to gain insight into social processes, we develop highly interpretable probabilistic choice-models. We draw from the econometrics literature on discrete-choice models and combine them with matrix factorization methods, Bayesian statistics, and generalized linear models. These predictive models enable interpretability through their learned parameters and latent factors.

First, we study the social dynamics behind group collaborations for the collective creation of content, such as in Wikipedia, the Linux kernel, and the European Union law-making process. By combining the Bradley-Terry and Rasch models with matrix factorization and natural language processing, we develop a model of edit acceptance in peer-production systems. We discover controversial components (e.g., Wikipedia articles and European laws) and influential users (e.g., Wikipedia editors and parliamentarians), as well as features that correlate with a high probability of edit acceptance. The latent representations capture non-linear interactions between components and users, and they cluster well into different topics (e.g., historical figures and TV characters in Wikipedia, business and environment in European laws).

Second, we develop an algorithm for predicting the outcome of elections and of referenda by combining matrix factorization and generalized linear models. Our algorithm learns representations of votes and regions, which capture ideological and cultural voting patterns (e.g., liberal/conservative, rural/urban), and it predicts the vote results in unobserved regions from partial observations. We test our model on voting data in Germany, Switzerland, and the US, and we deploy it on a Web platform to predict Swiss referendum votes in real-time. On average, our predictions reach a mean absolute error of 1% after observing only 5% of the regions.

Third, we study how people perceive the carbon footprint of their day-to-day actions. We cast this problem as a comparison problem between pairs of actions (e.g., the difference between flying across continents and using household appliances), and we develop a statistical model of relative comparisons reminiscent of the Thurstone model in psychometrics. The model learns the usersâ perception as the parameters of a Bayesian linear regression, which enables us to derive an active-learning algorithm to collect data efficiently. Our experiments show that users overestimate the emissions of low-footprint actions and underestimate those of high-footprint actions.

Finally, we design a probabilistic model of pairwise-comparison outcomes that capture a wide range of time dynamics. We achieve this by replacing the static parameters of a class of popular pairwise-comparison models with continuous-time Gaussian processes. We also develop an efficient inference algorithm that computes, with only a few linear-time iterations over the data, an approximate Bayesian posterior distribution.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-7186
Author(s)
Kristof, Victor  
Advisors
Thiran, Patrick  
•
Grossglauser, Matthias  
Jury

Prof. Karl Aberer (président) ; Prof. Patrick Thiran, Prof. Matthias Grossglauser (directeurs) ; Prof. Robert West, Prof. Scott Hale, Prof. Rayid Ghani (rapporteurs)

Date Issued

2021

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2021-06-24

Thesis number

7186

Total of pages

167

Subjects

discrete-choice models

•

matrix factorization

•

Bayesian statistics

•

generalized linear models

•

comparisons

•

choices

•

probabilistic models

•

data mining

•

machine learning

•

computational social science

EPFL units
INDY2  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Available on Infoscience
June 21, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/179445
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés