arXiv4h ago

Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments

Divyanshu Kumar, Ishita Gupta, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

alignmentethicsevaluationbenchmarkingllm

Opportunity Brief

Create an open-source evaluation suite that probes LLM bias using a hierarchical taxonomy across multiple task types. This tool should demonstrate how alignment wrappers can be bypassed using task-switching, helping developers audit models more effectively.

Suggested repo: bias-probe

"Your model passed the safety test, but is it actually biased? Expose hidden stereotypes across 9 hierarchical axes."

Estimated effort: 60h