Yihang Ding, Wanke Xia, Yiting Zhao, Jinbo Su, Jialiang Yang, Zhengbo Zhang, Ke Wang, Wenming Yang
View original ↗Develop an interactive evaluation framework for LLM memory systems using gamified environments. This fills the void of static, context-only benchmarks by measuring long-term state tracking and reasoning.
Suggested repo: memground
"Stop testing memory with static RAG; start testing it in dynamic worlds."
Estimated effort: 50h