Xinhao Huang, You-Liang Huang, Zeyi Wen
View original ↗Implement a training-free compression engine that uses soft activation sparsity to shrink model size for consumer GPUs. This provides a drop-in replacement for standard quantization methods.
Suggested repo: SoLA-Engine
"Slim your models without the training hassle or hardware-specific requirements."
Estimated effort: 60h