Mohammad Rezaei, Jens Lehmann, Sahar Vahdati
View original ↗Develop a library that supports process-based reward modeling for any standard LLM training pipeline. This is the missing piece to enable reliable multi-step reasoning without relying on outcome-only labels.
Suggested repo: StepRewards
"Reward the logic, not just the final answer."
Estimated effort: 50h