Build a regression testing framework specifically for agentic code tools that tracks 'laziness' and 'correctness' decay over time. By capturing model responses to a standard suite of complex refactoring tasks, developers can quantify performance regressions following model updates.