Multi-armed Bandits in Action

Salehi, Farnood

doi:10.5075/epfl-thesis-9935

doctoral thesis

Multi-armed Bandits in Action

2020

Making decisions is part and parcel of being human. Among a set of actions, we want to choose the one that has the highest reward. But the uncertainty of the outcome prevents us from always making the right decision. Making decisions under uncertainty can be studied in a principled way by the exploitation-exploration framework. The multi-armed bandit (MAB) framework is perhaps one of the simplest, and yet one of the most powerful settings to optimize a sequence of choices within an exploitation-exploration framework. In this thesis, I study several MAB related problems, from three different perspectives: I study (1) how machine-learning (ML) can benefit from MAB algorithms, (2) how MAB algorithms can affect humans, and (3) how human interactions can be studied in the MAB framework.

(1) Optimization lies at the heart of almost all ML algorithms. Stochastic-gradient descent (SGD) and stochastic-coordinate descent (CD) are perhaps two of the most well known and widely used optimization algorithms. In Chapters 2 and 3, I revisit the datapoint and coordinate-selection procedure of SGD and CD from an MAB point of view. The goal is to reduce the training time of ML models. SGD works by estimating, on the fly, the gradient of the cost function by sampling a datapoint uniformly at random from a training set, and CD works by updating only a single decision variable (coordinate) sampled at random. Updating the modelâ s parameters based on, respectively, different datapoints or different coordinates, yields various improvements. However, a priori, it is not clear which datapoint or coordinate improves the model the most. I address this challenge by studying these problems in an MAB setting, and I develop algorithms to learn the optimal datapoint or coordinate selection strategies. Our methods often significantly reduce the training time of several machine-learning models. (2) Although some MAB algorithms are designed to improve ML algorithms, they can affect humansâ opinions about the outside world. In the personalized recommender systems, the goal is to predict the preference of a user and to suggest the best content to them. However, recent studies suggest that a personalized algorithm can learn and propagate systematic biases and polarize opinions [Pariser, 2011]. In Chapter 4, to combat bias, I propose to use constraints on the distribution from which a content is selected. The constraints can be designed to ameliorate polarization and biases. I combine the classic MAB setting with these constraints and show how an adaptation of an MAB algorithm can lead to a scalable algorithm with provable guarantees for the constrained setting. Furthermore, I show that our algorithms often significantly outperform the existing algorithms that we could apply to this setting.

(3) Interacting with others is one of the main sources of information for us. In Chapter 5, I study how this natural setting in the human world can be studied in the bandit framework. I extend the classic single decision-maker setting of MAB to multiple decision-makers, where a decision-maker observes her neighborsâ decisions and rewards. Presumably, the additional information of the neighbors should improve the decisions. I show how to model such a decision-making process that appeals to the classic MAB framework. I study the new setting, in both stochastic and adversarial MAB frameworks, and I develop algorithms that incorporate the additional knowledge of the peers.

Type

doctoral thesis

DOI

10.5075/epfl-thesis-9935

Author(s)

Salehi, Farnood

Advisors

Thiran, Patrick

•

Celis, Laura Elisa

Jury

Prof. Ola Nils Anders Svensson (président) ; Prof. Patrick Thiran, Dr Laura Elisa Celis (directeurs) ; Prof. Martin Jaggi, Prof. Stephan Mandt, Dr Silvio Lattanzi (rapporteurs)

Date Issued

2020

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2020-02-21

Thesis number

9935

Total of pages

180

Subjects

multi-armed bandit

•

recommender systems

•

uncertainty

•

decision making

•

machine learning

•

stochastic optimization

•

stochastic gradient descent

•

stochastic coordinate descent

•

polarization

EPFL units

Faculty

School

Doctoral School

Available on Infoscience

February 17, 2020

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/166260