Reasoning + Llm + Inference | hypedar

Reasoning + Llm + Inference

20.0

Develop a token-level RLHF wrapper that optimizes prompt robustness to prevent failure on subtle input variations. This is critical for reliable agents.

+0

emergingimplementation gap

robustnessreasoningpromptingllmhealthcarerlhfinference

Signals (23)

Distributionally Robust Token Optimization in RLHF

Medical Reasoning with Large Language Models: A Survey and MR-Bench

What Makes a Good Response? An Empirical Analysis of Quality in Qualitative Interviews

Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation

CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation

Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?

HuggingFace4d ago

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Ranked Activation Shift for Post-Hoc Out-of-Distribution Detection

Robust Reasoning Benchmark

z-lab/dflash

The limits of bio-molecular modeling with large language models : a cross-scale evaluation

Attention-Based Sampler for Diffusion Language Models

StaRPO: Stability-Augmented Reinforcement Policy Optimization

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

FVD: Inference-Time Alignment of Diffusion Models via Fleming-Viot Resampling

OpenBMB/VoxCPM

Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models