Qing Zhu, Xian Yu
View original ↗Provide a clean reference implementation of residuals-based offline RL to stabilize policy learning. This would be a valuable addition to existing RL toolkits like Stable-Baselines3.
Suggested repo: res-offline-rl
"Stop distribution shift from breaking your offline RL agent."
Estimated effort: 40h