Ziyi Wang, Siva Rajesh Kasa, Ankith M S, Santhosh Kumar Kasa, Jiaru Zou, Sumit Negi, Ruqi Zhang, Nan Jiang, Qifan Song
View original ↗Implement dynamic verification for speculative decoding. This will allow speedups on a wider range of hardware by relaxing the rigid acceptance constraints currently standard in vLLM or similar engines.
Suggested repo: relaxed-spec
"Faster inference via smarter speculation: relax your verification bottleneck."
Estimated effort: 80h