Create an open-source evaluation suite for agentic workflows in enterprise settings. Current tools focus on coding, but there is a gap in measuring 'business logic' consistency for advertising/media ops.
Suggested repo: AgentAudit
"Stop guessing if your enterprise agents are actually working."
Estimated effort: 40h