Jie Sun, Yu Liu, Lu Han, Qiwen Deng, Xiang Shu, Yang Xiao, Xingyu Lu, Jun Zhou, Pengfei Liu, Lintao Ma, Jiancan Wu, Xiang Wang
View original ↗Implement a training-free attention modification layer that prevents dispersion for numerical sequences. This is a highly reusable component for any math-heavy or data-heavy transformer model.
Suggested repo: sep-attn
"Stop your LLM from losing focus on long numerical sequences without retraining."
Estimated effort: 25h