edwardsrobbie
View original ↗Develop an open-source evaluation benchmark specifically focused on 'AI security marketing' vs reality to quantify the actual efficacy of LLMs in zero-day discovery. Standardize the reporting of tool capabilities to combat hype.
Suggested repo: eval-honesty
"How good are they really? Building a baseline to audit AI security claims."
Estimated effort: 60h