Firoj Alam, Gagan Bhatia, Sahinur Rahman Laskar, Shammur Absar Chowdhury
View original ↗Ship a Python package for deterministic model evaluation. This replaces expensive LLM-as-a-judge setups with fast, sub-1B parameter models for CI/CD.
Suggested repo: omniscore-lite
"Replace your $100/day LLM judge with a local, deterministic metric."
Estimated effort: 50h