arXiv3h ago

DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

Ziyi Wang, Siva Rajesh Kasa, Ankith M S, Santhosh Kumar Kasa, Jiaru Zou, Sumit Negi, Ruqi Zhang, Nan Jiang, Qifan Song

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty8/10

Categorypaper

Topics

inferencequantization

Opportunity Brief

Implement dynamic verification for speculative decoding. This will allow speedups on a wider range of hardware by relaxing the rigid acceptance constraints currently standard in vLLM or similar engines.

Suggested repo: relaxed-spec

"Faster inference via smarter speculation: relax your verification bottleneck."

Estimated effort: 80h