Create an automated benchmarking suite to measure AI 'continuity'—how well models maintain state across long sessions and multiple storage layers.