Pengrui Lu, Bingyu Xu, Wenjun Zhang, Shengjia Hua, Xuanjian Gao, Ranxiang Ge, Lyumanshan Ye, Linxuan Wu, Yiran Li, Junfei Fish Yu, Yibo Zhang, Ruixin Li, Manxiang Li, Xiao Han, Xiaocong Zhou, Guangyao Chi, Zisheng Chen, Kaishen Chen, Kun Wang, Qihua Xu, Fengyue Meng, Yuchen Ni, Jiajun Li, Jinxiu Liu, Danfeng Zhang, Jingru Zhao, Pengfei Liu
View original ↗Build an 'AlphaEval' framework for production AI agents that handles multimodal, implicit-constraint environments. Focus on telemetry integration for live monitoring of agent performance.
Suggested repo: AgentOps-Eval
"Finally, an evaluation suite for real-world agents, not just benchmarks."
Estimated effort: 90h