Qi Zhang
View original ↗Create an easy-to-use RL library that implements the PODPO algorithm for training policies without gradient clipping. It would simplify the training pipeline for complex control systems.
Suggested repo: nanoRL
"Reinforcement learning without the gradient clipping headaches."
Estimated effort: 60h