Zixian Huang, Kaichen Yang, Xu Huang, Feiyang Hao, Qiming Ge, Bowen Li, He Du, Kai Chen, Qipeng Guo
View original ↗Build a toolkit that synchronizes the output style of teacher models with student models during SFT. This addresses the performance drop seen when reasoning models are trained on mismatched synthetic data.
Suggested repo: alignSFT
"Stop training reasoning models on stylistic noise."
Estimated effort: 40h