I trained a 2.8B Mamba model to reason entirely in its hidden state before outputting a single token — O(1) VRAM, no KV-cache, runs on a 12GB RTX 3060 | hypedar