feed trends discover showcase archive

Feed Trends Discover Showcase Archive Dashboard

Submit Showcase

Trending now

Security + Agents + Infrastructure60 Multimodal + Inference46 Security + Vulnerability35

View all trends →

hypedar

AI trend radar for developers. Catch emerging papers, repos, and discussions before the hype peaks.

About GitHub Discord

By the makers of hypedar

Codepawl

Open-source tools for developers.

Explore our tools →

About Privacy Terms X

© 2026 Codepawl

Built by Codepawl·© 2026

About·Terms·Privacy·Security

GitHub·Discord·X

feed trends discover showcase archive

Rag + Evaluation | hypedar

Rag + Evaluation

14.0

Create an automated benchmarking suite to measure AI 'continuity'—how well models maintain state across long sessions and multiple storage layers.

+0

emergingimplementation gap

ragevaluation

Signals (6)

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

ATANT: An Evaluation Framework for AI Continuity

Google AI18h ago

ConvApparel: Measuring and bridging the realism gap in user simulators

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation