Andrey Pustovit
View original ↗Create a tool to pre-calculate and cache KV states as 'Knowledge Packs', injecting them directly into the model for RAG tasks without the token overhead. This is a game-changer for latency and cost.
Suggested repo: kpack
"RAG for free: deliver facts directly via KV cache injection."
Estimated effort: 70h