Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal
View original ↗Provide a lightweight wrapper for deploying Sparse Autoencoders on existing LLMs to detect and mitigate jailbreak attacks. This is a critical security layer for enterprise AI production.
Suggested repo: sae-guard
"Protect your model from jailbreaks by observing its internal state."
Estimated effort: 70h