arXiv9h ago

Personalized Benchmarking: Evaluating LLMs by Individual Preferences

Cristina Garbacea, Heran Wang, Chenhao Tan

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty6/10

Categorypaper

Topics

llmalignmentpersonalization

Opportunity Brief

Develop a framework that re-ranks standard LLM benchmarks based on user-provided preference weights. This allows researchers to understand model performance through the lens of specific user archetypes rather than flat averages.

Suggested repo: pref-eval

"Benchmarks that actually care about your specific preferences."

Estimated effort: 40h