Channy Yun (윤석찬)
View original ↗Build a performance benchmarking tool for Claude 4.7 vs. other top models in agentic scenarios. Quantify the 'reasoning' performance gap in long-running tasks.
Suggested repo: bench-agent
"Measure which agent model actually gets the work done."
Estimated effort: 40h