A plethora of real world problems consist of a number of agents that interact, learn, cooperate, coordinate, and compete with others in ever more complex environments. Examples include autonomous vehicles, robotic agents, intelligent infrastructure, IoT devices, and so on. As more and more autonomous agents are deployed in the real-world, it will bring forth the need for novel algorithms, theory, and tools to enable coordination on a massive scale. In this thesis, we develop such tools to tackle two central challenges in multi-agent coordination research: solving allocation problems, and resource sharing, focusing on solutions that are scalable, practical, and applicable to real-world problems.
In the first part of the thesis we tackle the problem of allocating resources to agents, i.e., solving a weighted matching problem. Real-world matching problems may occur in massively large systems, they are distributed and information-restrictive, and individuals have to reveal their preferences over the possible matches in order to get a high quality match, which brings forth significant privacy risks. As such, there are three main challenges: complexity, communication, and privacy.
Our proposed approach, ALMA, is a practical heuristic designed for real-world, large-scale ($10^6$ agents) applications. It is based on a simple altruistic behavioral convention: agents have a higher probability to back-off from contesting a resource if they have good alternatives, potentially freeing the resource for some agent that does not. ALMA tackles all of the aforementioned challenges: it is decentralized, runs on-device, requires no inter-agent communication, converges in constant time -- under reasonable assumptions --, and provides strong, worst-case, privacy guarantees. Moreover, by incorporating learning we can mitigate the loss in social welfare and increase fairness. Finally, rational agents can use such simple conventions, along with an arbitrary signal from the environment, to learn a correlated equilibrium for accessing a set resources, under high congestion.
In the second part of the thesis we focus on a critical open problem: the question of cooperation in socio-ecological and socio-economical systems, and sustainability in the use of common-pool resources. In recent years, learning agents, especially deep reinforcement learning agents, have become ubiquitous in such systems. Yet, scaling to environments with a large number of agents and low observability continues to be a challenge. In our work, we focus on common-pool resources. Individuals face strong incentives to appropriate, which results in overuse and even the depletion of the resources. Our goal is to apply simple interventions to steer the population to desirable states.
We propose a simple, yet powerful, and robust technique: allow agents to observe an arbitrary common signal from the environment. The agents learn to couple their policies, and avoid depletion in a wider range of settings, while achieving higher social welfare and convergence speed.
Finally, we propose a practical approach to computing market prices and allocations via a deep reinforcement learning policymaker agent. Compared to the idealized market equilibrium outcome -- which can not always be efficiently computed -- our policymaker is much more flexible, allowing us to tune the prices with regard to diverse objectives such as sustainability and resource wastefulness, fairness, buyers' and sellers' welfare, etc.
EPFL_TH8007.pdf
n/a
openaccess
Copyright
13.91 MB
Adobe PDF
5ed0e2b5d126ed3e2e423a9e6c91eccb