Anish Maddipoti
View original ↗Build an open-source controller that handles disaggregated LLM inference (splitting prefill/decode phases) on standard Kubernetes clusters. Current tools usually treat inference as a monolith, wasting resources.
Suggested repo: DecoupleLLM
"Maximize your LLM throughput by decoupling prefill and decode stages."
Estimated effort: 180h