Amit Dhanda
View original ↗Develop a benchmarking tool that tests belief revision capabilities when premises are dynamically modified. This is critical for building agents that function in changing environments.
Suggested repo: DeltaBench
"Benchmark your model's ability to actually change its mind when the facts change."
Estimated effort: 25h