MAD: A Magnitude And Direction Policy Parametrization for Stability Constrained Reinforcement Learning
We introduce magnitude and direction (MAD) policies, a policy parameterization for reinforcement learning (RL) that preserves ℓp closed-loop stability for nonlinear dynamical systems. Despite their completeness in describing all stabilizing controllers, methods based on nonlinear Youla and system-level synthesis are significantly impacted by the difficulty of parametrizing ℓp-stable operators. In contrast, MAD policies introduce explicit feedback on state-dependent features – a key element behind the success of reinforcement learning pipelines – without jeopardizing closed-loop stability. This is achieved by letting the magnitude of the control input be described by a disturbance-feedback ℓp-stable operator, while selecting its direction based on state-dependent features through a universal function approximator. We further characterize the robust stability properties of MAD policies under model mismatch. Unlike existing disturbance-feedback policy parametrizations, MAD policies introduce state-feedback components compatible with model-free RL pipelines, ensuring closed-loop stability with no model information beyond assuming open-loop stability. Numerical experiments show that MAD policies trained with deep deterministic policy gradient (DDPG) methods generalize to unseen scenarios – matching the performance of standard neural network policies while guaranteeing closed-loop stability by design.
EPFL
École Polytechnique Fédérale de Lausanne
EPFL
EPFL
2025-12-09
942
947
REVIEWED
EPFL
| Event name | Event acronym | Event place | Event date |
Rio de Janeiro, Brazil | 2025-12-09 - 2025-12-12 | ||