Create a lightweight, standalone library that implements reinforcement fine-tuning (e.g., DPO/PPO) for local open-weight models. Most current implementations are too heavy or tied to massive enterprise clusters.