r/LocalLLaMA9h ago

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

/u/RecognitionFlat1470

Analysis

Viral velocity

low

Implementation gapYES

Novelty9/10

Categorytool

Topics

inferenceoptimization

Implement 'mmap-first' loading for llama.cpp on low-power devices. This optimization makes LLMs viable on consumer smartwatches.

Suggested repo: NanoLLM

"Run LLMs on your wrist with extreme memory efficiency."

Estimated effort: 50h