Build a simplified 'SGLang-lite' for edge devices, focusing on small-model inference (1B-3B parameters) that prioritizes latency over throughput. This would target the 'AI on hardware' niche.