Regret-Optimized Portfolio Enhancement through Deep Reinforcement Learning and Future Looking Rewards
This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO). Rather than focusing solely on traditional portfolio construction, our approach aims to improve an already high-performing strategy through dynamic rebalancing driven by PPO and Oracle agents. Our target is to enhance the traditional 60/40 benchmark (60% stocks, 40% bonds) by employing the Regret-based Sharpe reward function. To address the impact of transaction fee frictions and prevent signal loss, we develop a transaction cost scheduler. We introduce a future-looking reward function and employ synthetic data training through a circular block bootstrap method to facilitate the learning of generalizable allocation strategies. We focus on two key evaluation measures: return and maximum drawdown. Our method not only enhances the performance of the existing portfolio strategy through strategic rebalancing but also demonstrates strong results compared to other RL baselines.
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
2025-11-14
New York, NY, USA
979-8-4007-2220-2
890
897
REVIEWED
EPFL
| Event name | Event acronym | Event place | Event date |
ICAIF '25 | Singapore Singapore | 2025-11-15 - 2025-11-18 | |