Develop a generalized library that abstracts block diffusion speculative decoding for other non-Transformer architectures. Focus on making it a plug-and-play adapter for existing local inference backends like vLLM or llama.cpp.
Suggested repo: drafty
"Accelerate your local inference with block diffusion drafting."
Estimated effort: 40h