arXiv1d ago

GroupDPO: Memory efficient Group-wise Direct Preference Optimization

Jixuan Leng, Si Si, Hsiang-Fu Yu, Vinod Raman, Inderjit S. Dhillon

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

fine-tuningrl

Opportunity Brief

Build an efficient library for Group-wise DPO that scales across multiple candidate responses per prompt. This increases sample efficiency compared to standard binary DPO.

Suggested repo: group-dpo

"Make preference alignment faster by using all your response data at once."

Estimated effort: 50h