hypedarhypedar
feedtrendsdiscovershowcasearchive
login
login
login
FeedTrendsDiscoverShowcaseArchiveDashboard
Submit Showcase

Trending now

Privacy + Training + Agents67Inference + Agents + Llm67Math + Games56
View all trends →

hypedar

AI trend radar for developers. Catch emerging papers, repos, and discussions before the hype peaks.

AboutGitHubDiscord

By the makers of hypedar

Codepawl

Open-source tools for developers.

Explore our tools →
AboutPrivacyTermsX

© 2026 Codepawl

Built by Codepawl·© 2026

About·Terms·Privacy·Security

GitHub·Discord·X

feedtrendsdiscovershowcasearchive
← feed
arXiv9h ago
4.3

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

Sinan G. Aksoy, Alexandra A. Sabrio, Erik VonKaenel, Lee Burke

View original ↗

Analysis

Viral velocity
low
Implementation gapYES
Novelty5/10
Categorytool
Topics
evaluationraginference

Opportunity Brief

Create an open-source library for automated robustness testing of LLM-as-a-judge systems using adversarial document perturbation. This allows RAG developers to measure how fragile their evaluation pipeline is to minor text changes.

Suggested repo: judge-stress

"Is your LLM evaluator actually blind?"

Estimated effort: 40h