arXiv13h ago

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

Krzysztof Fonal

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty8/10

Categorypaper

Topics

inferenceml-acceleratormlx

Opportunity Brief

Extend MLX-LM to support cross-tokenizer speculative decoding natively for Apple Silicon users. This democratizes fast LLM inference for local, mixed-model workflows.

Suggested repo: SpeculateMLX

"Run speculative decoding on your Mac, even with mismatched models."

Estimated effort: 90h