Hanrui Luo, Shreyank N Gowda
View original ↗Build an open-source evaluation suite for jailbreak detection that aggregates generation inconsistency scores. This helps developers identify which prompts trigger risky behaviors in their models.
Suggested repo: jailGuard
"Know your model's limits—detect jailbreaks before your users find them."
Estimated effort: 30h