Krzysztof Fonal
View original ↗Extend MLX-LM to support cross-tokenizer speculative decoding natively for Apple Silicon users. This democratizes fast LLM inference for local, mixed-model workflows.
Suggested repo: SpeculateMLX
"Run speculative decoding on your Mac, even with mismatched models."
Estimated effort: 90h