Hiroki Fukui
View original ↗Create an automated multi-agent framework to test how different alignment techniques and instructions influence ethical behavior under pressure. This is a must-have for safety researchers developing 'hardened' LLMs.
Suggested repo: align-sim
"Stress-test your model's ethical alignment using multi-agent simulations."
Estimated effort: 45h