fredmendoza
View original ↗Create an optimized inference library specifically for CPU-bound small language models using modern instruction sets (AVX-512/AMX). Focus on reducing latency for models under 3B parameters.
Suggested repo: cpu-infer
"Stop wasting GPU cycles on small model tasks."
Estimated effort: 40h