Jelena Markovic-Voronov, Wenhui Zhu, Bo Long, Zhipeng Wang, Suyash Gupta, Kayhan Behdin, Bee-Chung Chen, Deepak Agarwal
View original ↗Create a drop-in decoding library for LLMs that performs reward-guided sampling using Sequential Monte Carlo (SMC). This allows developers to steer LLM generation toward specific quality rewards without additional fine-tuning.
Suggested repo: smc-sampler
"Steer LLM output without training a single weight."
Estimated effort: 50h