Build an open-source evaluation platform for LLM long-term memory that uses interactive, gamified environments. Current evals are too static; this would set a new bar for agency research.