Renqi Chen, Zeyin Tao, Jianming Guo, Jing Wang, Zezhou Xu, Jingzhe Zhu, Qingqing Sun, Tianyi Zhang, Shuai Chen
View original ↗Create an open-source evaluation suite simulating e-commerce risk environments for GUI agents. This allows developers to test their agents against adversarial risk management scenarios.
Suggested repo: riskBench
"Can your agent stop fraud? Stress-test GUI agents in realistic e-commerce risk scenarios."
Estimated effort: 100h