Safety is a crucial concern when deploying reinforcement learning (RL) algorithms in real-world applications. Furthermore, safety has many dimensions, ranging from ensuring reasonable performance to respecting predefined constraints.

In this talk, we focus on safety from an offline perspective, where the RL agent only has access to a fixed dataset of prior trajectories, without direct interactions with the environment. Given the availability of the behavior policy responsible for data collection, the primary challenge is crafting a policy that outperforms such a behavior policy.

We will present algorithms that leverage the behavior policy to compute an improved policy with high probability and discuss how to exploit the environment’s structure to improve sample efficiency.

Speaker

Thiago D. Simão is an assistant professor at Eindhoven University of Technology.

Time and Place

Monday 18/03/2026 at 13:45pm in M.A.143

Registration

Participation is free, but registration is compulsory.

References and Related Reading