Yao Chen, Jiawei Sheng, Wenyuan Zhang, Tingwen Liu
View original ↗Build a tool that implements mixture-of-layers distillation, focusing on capturing the 'critical attention' path of larger teachers. Developers could use this to create significantly more efficient reasoning models for constrained environments.
Suggested repo: distill-reason
"Small models, big thoughts: Distill reasoning logic, not just output."
Estimated effort: 100h