/u/carolinedfrasca
View original ↗Develop a unified, hardware-agnostic inference engine that maximizes Blackwell and MI355X performance without relying on proprietary monolithic stacks. Developers should focus on creating a simplified implementation for Gemma 4's MoE architecture to achieve lower latency on diverse hardware.
Suggested repo: gemma-native
"Achieve 15% better throughput on Blackwell than vLLM with this hardware-agnostic Gemma 4 engine."
Estimated effort: 120h