arXiv12h ago

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

Shubin Kim, Yejin Son, Junyeong Park, Keummin Ka, Seungbeen Lee, Jaeyoung Lee, Hyeju Jang, Alice Oh, Youngjae Yu

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty5/10

Categorydiscussion

Topics

discussionfine-tuning

Opportunity Brief

Develop an automated evaluation suite that detects counterfactual unfairness in LLMs using humor as a probe. This can be a critical tool for safety-focused developers.

Suggested repo: fair-humor

"Test model bias using the ultimate litmus test: humor."

Estimated effort: 35h