Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Multi-armed Bandits in Action
 
doctoral thesis

Multi-armed Bandits in Action

Salehi, Farnood  
2020

Making decisions is part and parcel of being human. Among a set of actions, we want to choose the one that has the highest reward. But the uncertainty of the outcome prevents us from always making the right decision. Making decisions under uncertainty can be studied in a principled way by the exploitation-exploration framework. The multi-armed bandit (MAB) framework is perhaps one of the simplest, and yet one of the most powerful settings to optimize a sequence of choices within an exploitation-exploration framework. In this thesis, I study several MAB related problems, from three different perspectives: I study (1) how machine-learning (ML) can benefit from MAB algorithms, (2) how MAB algorithms can affect humans, and (3) how human interactions can be studied in the MAB framework.

(1) Optimization lies at the heart of almost all ML algorithms. Stochastic-gradient descent (SGD) and stochastic-coordinate descent (CD) are perhaps two of the most well known and widely used optimization algorithms. In Chapters 2 and 3, I revisit the datapoint and coordinate-selection procedure of SGD and CD from an MAB point of view. The goal is to reduce the training time of ML models. SGD works by estimating, on the fly, the gradient of the cost function by sampling a datapoint uniformly at random from a training set, and CD works by updating only a single decision variable (coordinate) sampled at random. Updating the modelâ s parameters based on, respectively, different datapoints or different coordinates, yields various improvements. However, a priori, it is not clear which datapoint or coordinate improves the model the most. I address this challenge by studying these problems in an MAB setting, and I develop algorithms to learn the optimal datapoint or coordinate selection strategies. Our methods often significantly reduce the training time of several machine-learning models. (2) Although some MAB algorithms are designed to improve ML algorithms, they can affect humansâ opinions about the outside world. In the personalized recommender systems, the goal is to predict the preference of a user and to suggest the best content to them. However, recent studies suggest that a personalized algorithm can learn and propagate systematic biases and polarize opinions [Pariser, 2011]. In Chapter 4, to combat bias, I propose to use constraints on the distribution from which a content is selected. The constraints can be designed to ameliorate polarization and biases. I combine the classic MAB setting with these constraints and show how an adaptation of an MAB algorithm can lead to a scalable algorithm with provable guarantees for the constrained setting. Furthermore, I show that our algorithms often significantly outperform the existing algorithms that we could apply to this setting.

(3) Interacting with others is one of the main sources of information for us. In Chapter 5, I study how this natural setting in the human world can be studied in the bandit framework. I extend the classic single decision-maker setting of MAB to multiple decision-makers, where a decision-maker observes her neighborsâ decisions and rewards. Presumably, the additional information of the neighbors should improve the decisions. I show how to model such a decision-making process that appeals to the classic MAB framework. I study the new setting, in both stochastic and adversarial MAB frameworks, and I develop algorithms that incorporate the additional knowledge of the peers.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9935
Author(s)
Salehi, Farnood  
Advisors
Thiran, Patrick  
•
Celis, Laura Elisa  
Jury

Prof. Ola Nils Anders Svensson (président) ; Prof. Patrick Thiran, Dr Laura Elisa Celis (directeurs) ; Prof. Martin Jaggi, Prof. Stephan Mandt, Dr Silvio Lattanzi (rapporteurs)

Date Issued

2020

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2020-02-21

Thesis number

9935

Total of pages

180

Subjects

multi-armed bandit

•

recommender systems

•

uncertainty

•

decision making

•

machine learning

•

stochastic optimization

•

stochastic gradient descent

•

stochastic coordinate descent

•

polarization

EPFL units
INDY2  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Available on Infoscience
February 17, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/166260
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés