arXiv6h ago

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

Zixian Huang, Kaichen Yang, Xu Huang, Feiyang Hao, Qiming Ge, Bowen Li, He Du, Kai Chen, Qipeng Guo

View original ↗

Analysis

Viral velocity

low

Implementation gapNo

Novelty7/10

Categorypaper

Topics

reasoningfine-tuningsynthetic-data

Opportunity Brief

Build a toolkit that synchronizes the output style of teacher models with student models during SFT. This addresses the performance drop seen when reasoning models are trained on mismatched synthetic data.

Suggested repo: alignSFT

"Stop training reasoning models on stylistic noise."

Estimated effort: 40h