Ning Yang, Hengyu Zhong, Wentao Wang, Baoliang Tian, Haijun Zhang, Jun Wang
View original ↗Implement a linear-memory distillation method to preserve short-text performance when extending model context windows.
Suggested repo: linear-ard
"Extend your context window without forgetting how to write short text."
Estimated effort: 60h