Steven Au, Sujit Noronha
View original ↗Create a harness that automates epistemic attack testing for models. Developers can build an evaluation platform that allows teams to test their models against challenging, pressure-filled prompts.
Suggested repo: sycophancy-check
"Is your model agreeing with you just to please you? Benchmark its epistemic integrity."
Estimated effort: 35h