arXiv7h ago

Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo

Jelena Markovic-Voronov, Wenhui Zhu, Bo Long, Zhipeng Wang, Suyash Gupta, Kayhan Behdin, Bee-Chung Chen, Deepak Agarwal

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty9/10

Categorypaper

Topics

inferencellmsampling

Opportunity Brief

Create a drop-in decoding library for LLMs that performs reward-guided sampling using Sequential Monte Carlo (SMC). This allows developers to steer LLM generation toward specific quality rewards without additional fine-tuning.

Suggested repo: smc-sampler

"Steer LLM output without training a single weight."

Estimated effort: 50h