Parsimonious Online Learning with Kernels and Random Features with Applications to Stochastic Optimal Control

Le Moal, Quentin

doi:10.5075/epfl-thesis-10474

doctoral thesis

Parsimonious Online Learning with Kernels and Random Features with Applications to Stochastic Optimal Control

2025

This thesis explores stochastic optimal control through three interconnected contributions: a theoretical framework, and the development of two regression methods, POLK and POLRF, tailored to high-dimensional regression challenges in this domain.

In the first part, we propose a general framework for solving stochastic optimal control problems using dynamic programming. A key contribution is the decomposition of total error at each time step into local approximation errors and propagated errors from future steps. This decomposition provides new theoretical insights into error backpropagation, a relatively underexplored area in stochastic control. While this framework establishes a solid foundation, its applicability is limited to problems amenable to dynamic programming.

The second part introduces POLK (Parsimonious Online Learning with Kernels), a kernel-based regression method enhanced with sparsification to control model complexity. POLK achieves efficient evaluation times and performs well on intricate regression tasks with localised variations. We further improve its empirical convergence speed by incorporating a second gradient term while maintaining theoretical guarantees. However, the computational cost of the sparsification process restricts its scalability to larger datasets, limiting its practical use in high-dimensional settings.

In the third part, we develop POLRF (Parsimonious Online Learning with Random Features), a scalable alternative to kernel-based methods. POLRF replaces kernels with random features, dynamically adapting its feature map distribution through sparsification. This adaptation resembles kernel learning, allowing POLRF to construct problem-specific representations. We establish a theoretical framework for POLRF, including convergence guarantees in a trajectory-dependent reproducing kernel Hilbert space (RKHS). Empirical results on the max-call option pricing problem demonstrate POLRF's competitive performance, including faster convergence and robustness in high dimensions. Despite its promise, open questions remain regarding the properties of the constructed RKHS and the optimisation of random feature distributions for specific problems.