Donnie Prakoso
View original ↗Create a lightweight, standalone library that implements reinforcement fine-tuning (e.g., DPO/PPO) for local open-weight models. Most current implementations are too heavy or tied to massive enterprise clusters.
Suggested repo: nanoRL
"Master RL-based fine-tuning on consumer hardware with 66% accuracy gains."
Estimated effort: 100h