omer_k
View original ↗Build a memory-efficient quantization tool specifically for high-density compute tasks. Focus on techniques to minimize VRAM usage while maintaining throughput.
Suggested repo: lowMem
"Inference at scale when the hardware supply chain is broken."
Estimated effort: 150h