arXiv6h ago

TEMPER: Testing Emotional Perturbation in Quantitative Reasoning

Atahan Dokme, Benjamin Reichman, Larry Heck

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty6/10

Categorypaper

Topics

reasoningbenchmarkllm

Opportunity Brief

Build a benchmarking framework that tests reasoning under emotional stress. This helps developers understand how real-world user frustration impacts AI performance.

Suggested repo: temper-bench

"Does your LLM crumble under pressure? Stress-test reasoning with emotional context."

Estimated effort: 20h