Atahan Dokme, Benjamin Reichman, Larry Heck
View original ↗Build a benchmarking framework that tests reasoning under emotional stress. This helps developers understand how real-world user frustration impacts AI performance.
Suggested repo: temper-bench
"Does your LLM crumble under pressure? Stress-test reasoning with emotional context."
Estimated effort: 20h