Divyanshu Kumar, Ishita Gupta, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi
View original ↗Create an open-source evaluation suite that probes LLM bias using a hierarchical taxonomy across multiple task types. This tool should demonstrate how alignment wrappers can be bypassed using task-switching, helping developers audit models more effectively.
Suggested repo: bias-probe
"Your model passed the safety test, but is it actually biased? Expose hidden stereotypes across 9 hierarchical axes."
Estimated effort: 60h