Fair model-based reinforcement learning comparisons with explicit and consistent update frequency
Joint work with Abdelhakim Benechehab, Giuseppe Paolo and Balázs Kégl.
Implicit update frequencies can introduce ambiguity in the interpretation of model-based reinforcement learning benchmarks, obscuring the real objective of the evaluation. While the update frequency can sometimes be optimized to improve performance, real-world applications often impose constraints, allowing updates only between deployments on the actual system. This blog post emphasizes the need for evaluations using consistent update frequencies across different algorithms to provide researchers and practitioners with clearer comparisons under realistic constraints.
Published at the Third Blogpost Track at ICLR 2024. Check the blog post here.