Jiawei Huang, Qingping Yang, Renjie Zheng, Jiaze Chen
View original ↗Build a library that translates raw SWE agent traces into multi-step rubric rewards. This allows developers to fine-tune agents using intermediate milestones rather than just pass/fail test results.
Suggested repo: rubricRL
"Stop training your agents on pass/fail—train them on logic."
Estimated effort: 40h