Create an open-source evaluation suite for agentic workflows in enterprise settings. Current tools focus on coding, but there is a gap in measuring 'business logic' consistency for advertising/media ops.