Sourav Ganguly, Kartik Pandit, Arnob Ghosh
View original ↗Create an RL simulation environment that demonstrates robust agent learning against strategic adversaries. Focus on providing clean baseline implementations of optimistic policy learning with regret guarantees.
Suggested repo: adversarialRL
"Train agents that don't crack under pessimistic adversary pressure."
Estimated effort: 40h