Jianhao Huang, Zhanpeng Zhou, Renqiu Xia, Baharan Mirzasoleiman, Weijie Su, Wei Huang
View original ↗Implement a multi-token prediction (MTP) training wrapper for standard transformer architectures. This is the key to unlock better planning capabilities in standard models without increasing compute costs during inference.
Suggested repo: mtpTrain
"Improve model planning with multi-token prediction heads."
Estimated effort: 50h