Eren Unlu
View original ↗Build an agent evaluation suite that implements the SSTA-32 diagnostic framework to test whether agents can identify blockers before executing. This addresses the common issue of agents blindly failing tasks they aren't equipped to complete.
Suggested repo: TriageEval
"Stop your agents from attempting the impossible—diagnose first, execute second."
Estimated effort: 30h