Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu
View original ↗Create a diagnostic dashboard that measures cross-domain generalization in CoT-finetuned models. Help developers identify if their model is 'over-optimized' (memorized) or genuinely reasoning across new domains.
Suggested repo: GeneralizeCheck
"Is your model really reasoning, or just memorizing? Visualize SFT optimization bottlenecks."
Estimated effort: 30h