Suraj Yadav, Siddharth Yadav, Parth Goyal
View original ↗Develop a benchmarking toolkit that maps the plateau points of SLMs during GRPO/RL tuning. This tool should provide automated 'difficulty-scaling' analysis to help developers stop training once they hit diminishing returns.
Suggested repo: grpo-limit
"Know exactly when your SLM stops learning: Automated difficulty-plateau detection."
Estimated effort: 20h