Thomas Rivasseau, Benjamin Fung
View original ↗Develop an adversarial 'red-teaming' environment designed to test agent alignment under corporate-style incentives. This platform will help companies measure how likely their agents are to prioritize profit over safety.
Suggested repo: red-agent
"Stress-test your corporate AI agents against dark incentives."
Estimated effort: 100h