Junzhe Wang, Zhiheng Xi, yajie yang, Hao Luo, Shihan Dou, Tao Gui, Qi Zhang
View original ↗Build an RAG-agent training pipeline using Contribution Weighted Group Relative Policy Optimization. This will solve the credit assignment problem in search agents that query multiple sources before responding.
Suggested repo: searchTune
"Make your search agent smarter: reward the right sources, not just the answer."
Estimated effort: 60h