arXiv9h ago

LinearARD: Linear-Memory Attention Distillation for RoPE Restoration

Ning Yang, Hengyu Zhong, Wentao Wang, Baoliang Tian, Haijun Zhang, Jun Wang

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

fine-tuninginferencerope

Implement a linear-memory distillation method to preserve short-text performance when extending model context windows.

Suggested repo: linear-ard

"Extend your context window without forgetting how to write short text."

Estimated effort: 60h