Yishu Wei, Yi Lin, Adam Flanders, George Shih, Yifan Peng
View original ↗Create a generic template for using GRPO (Group Relative Policy Optimization) to align small LLMs for specialized classification tasks. This will demonstrate how to boost accuracy in domain-specific tasks without sacrificing base logic.
Suggested repo: grpo-align
"Accuracy-focused RL alignment for small models."
Estimated effort: 110h