Gaurav Rajesh Parikh, Angikar Ghosal
View original ↗Create a standardized benchmark suite for social intelligence using the 'Connections' game structure. This offers a more nuanced way to test agent reasoning than static QA datasets.
Suggested repo: social-bench
"Move beyond accuracy scores; test how your agents think, connect, and collaborate."
Estimated effort: 25h