Zhiyin Yu, Bo Zhang, Qibin Hou, Zhonghai Wu, Xiao Luo, Lei Bai
View original ↗Create an open-source framework that implements 'Easy Samples' RL, focusing on data-efficient fine-tuning without massive manual labeling. This would help developers scale model training on limited high-quality data.
Suggested repo: easy-rl
"Reinforcement learning for LLMs, minus the reward hacking and high costs."
Estimated effort: 60h