Andrew Kiruluta
View original ↗Develop a framework that bridges token-level pruning and model-level structured sparsity. Optimize LLMs for real-time inference by merging these two disparate fields.
Suggested repo: compress-infer
"Dynamic model compression that works in real-time."
Estimated effort: 80h