Create an open-source evaluation suite that probes LLM bias using a hierarchical taxonomy across multiple task types. This tool should demonstrate how alignment wrappers can be bypassed using task-switching, helping developers audit models more effectively.