Haolong Hu, Hanyu Li, Tiancheng He, Huahui Yi, An Zhang, Qiankun Li, Kun Wang, Yang Liu, Zhigang Zeng
View original ↗Create an adversarial testing toolkit for multi-turn MLLMs that evolves synthetic dialogues to uncover long-context safety vulnerabilities. Help developers stress-test their agents before deployment.
Suggested repo: steer-test
"Don't get jailbroken on the 10th turn. Stress-test your vision agents."
Estimated effort: 50h