arXiv13h ago

Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

Yousra Fettach, Guillaume Bied, Hannu Toivonen, Tijl De Bie

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty6/10

Categorypaper

Topics

reasoningalignmentevaluation

Opportunity Brief

Build a benchmark suite that evaluates LLMs on subjective human cultural alignment tasks like humor. Developers should create an extensible framework to ingest varied datasets (e.g., jokes, puns, sarcasm) to measure 'humor-alignment' across different models.

Suggested repo: nanoEval-Humor

"Is your model actually funny or just regurgitating datasets?"

Estimated effort: 20h