Ranjith Chodavarapu, Lei Xu
View original ↗Create an evaluation suite for inference engines (like vLLM) that measures FP16 accumulation divergence caused by KV caching.
Suggested repo: kv-drift-bench
"Verify if your KV cache is actually numerically stable."
Estimated effort: 20h