arXiv6h ago

Beyond Social Pressure: Benchmarking Epistemic Attack in Large Language Models

Steven Au, Sujit Noronha

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

reasoningbenchmarkingsafety

Opportunity Brief

Create a harness that automates epistemic attack testing for models. Developers can build an evaluation platform that allows teams to test their models against challenging, pressure-filled prompts.

Suggested repo: sycophancy-check

"Is your model agreeing with you just to please you? Benchmark its epistemic integrity."

Estimated effort: 35h