Jixuan Leng, Si Si, Hsiang-Fu Yu, Vinod Raman, Inderjit S. Dhillon
View original ↗Build an efficient library for Group-wise DPO that scales across multiple candidate responses per prompt. This increases sample efficiency compared to standard binary DPO.
Suggested repo: group-dpo
"Make preference alignment faster by using all your response data at once."
Estimated effort: 50h