Safety is a crucial concern when deploying reinforcement learning (RL) algorithms in real-world applications. Furthermore, safety has many dimensions, ranging from ensuring reasonable performance to respecting predefined constraints.

In this talk, we focus on safety from an offline perspective, where the RL agent only has access to a fixed dataset of prior trajectories, without direct interactions with the environment. Given the availability of the behavior policy responsible for data collection, the primary challenge is crafting a policy that outperforms such a behavior policy.

We will present algorithms that leverage the behavior policy to compute an improved policy with high probability and discuss how to exploit the environment’s structure to improve sample efficiency.

Speaker

Thiago D. Simão is an assistant professor at Eindhoven University of Technology.

Time and Place

Monday 18/03/2026 at 13:45pm in M.A.143

Registration

Participation is free, but registration is compulsory.

Laroche, R., Trichelair, P., and Tachet des Combes, R. (2019). Safe Policy Improvement with Baseline Bootstrapping. ICML, 3652–3661.
Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). Offline reinforcement learning: Tutorial, review, and perspectives on open problems.
Simão, T. D., and Spaan, M. T. J. (2019a). Safe policy improvement with baseline bootstrapping in factored environments. AAAI, 4967–4974.
Simão, T. D., and Spaan, M. T. J. (2019b). Structure learning for safe policy improvement. IJCAI, 3453–3459.
Simão, T. D., Suilen, M., and Jansen, N. (2023). Safe policy improvement for POMDPs via finite-state controllers. AAAI, 212–220.

Speaker

Time and Place

Registration

References and Related Reading