Build a verifiable environment for shopping agents that rewards successful transactions based on user constraints. This fills the gap for specialized domain-specific agent evaluation.
Suggested repo: shop-rl
"Train agents that know how to finish a transaction."
Estimated effort: 80h