Multi-armed Bandits in Action

Salehi, Farnood

doi:10.5075/epfl-thesis-9935

Salehi, Farnood

2020

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Making decisions is part and parcel of being human. Among a set of actions, we want to choose the one that has the highest reward. But the uncertainty of the outcome prevents us from always making the right decision. Making decisions under uncertainty can be studied in a principled way by the exploitation-exploration framework. The multi-armed bandit (MAB) framework is perhaps one of the simplest, and yet one of the most powerful settings to optimize a sequence of choices within an exploitation-exploration framework. In this thesis, I study several MAB related problems, from three different perspectives: I study (1) how machine-learning (ML) can benefit from MAB algorithms, (2) how MAB algorithms can affect humans, and (3) how human interactions can be studied in the MAB framework. (1) Optimization lies at the heart of almost all ML algorithms. Stochastic-gradient descent (SGD) and stochastic-coordinate descent (CD) are perhaps two of the most well known and widely used optimization algorithms. In Chapters 2 and 3, I revisit the datapoint and coordinate-selection procedure of SGD and CD from an MAB point of view. The goal is to reduce the training time of ML models. SGD works by estimating, on the fly, the gradient of the cost function by sampling a datapoint uniformly at random from a training set, and CD works by updating only a single decision variable (coordinate) sampled at random. Updating the modelâs parameters based on, respectively, different datapoints or different coordinates, yields various improvements. However, a priori, it is not clear which datapoint or coordinate improves the model the most. I address this challenge by studying these problems in an MAB setting, and I develop algorithms to learn the optimal datapoint or coordinate selection strategies. Our methods often significantly reduce the training time of several machine-learning models. (2) Although some MAB algorithms are designed to improve ML algorithms, they can affect humansâ opinions about the outside world. In the personalized recommender systems, the goal is to predict the preference of a user and to suggest the best content to them. However, recent studies suggest that a personalized algorithm can learn and propagate systematic biases and polarize opinions [Pariser, 2011]. In Chapter 4, to combat bias, I propose to use constraints on the distribution from which a content is selected. The constraints can be designed to ameliorate polarization and biases. I combine the classic MAB setting with these constraints and show how an adaptation of an MAB algorithm can lead to a scalable algorithm with provable guarantees for the constrained setting. Furthermore, I show that our algorithms often significantly outperform the existing algorithms that we could apply to this setting. (3) Interacting with others is one of the main sources of information for us. In Chapter 5, I study how this natural setting in the human world can be studied in the bandit framework. I extend the classic single decision-maker setting of MAB to multiple decision-makers, where a decision-maker observes her neighborsâ decisions and rewards. Presumably, the additional information of the neighbors should improve the decisions. I show how to model such a decision-making process that appeals to the classic MAB framework. I study the new setting, in both stochastic and adversarial MAB frameworks, and I develop algorithms that incorporate the additional knowledge of the peers.

Details

Title Multi-armed Bandits in Action

Author(s) Salehi, Farnood

Advisor(s)

Thiran, Patrick
Celis, Laura Elisa

Pagination 180

Date 2020

Publisher Lausanne, EPFL

Keywords

multi-armed bandit; recommender systems; uncertainty; decision making; machine learning; stochastic optimization; stochastic gradient descent; stochastic coordinate descent; polarization

Language English

DOI https://doi.org/10.5075/epfl-thesis-9935

Laboratories INDY2

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > INDY2 - lnformation and Network Dynamics Laboratory 2
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2020-02-17

Actions

Preview

Select file: