Leen AlQadi, Ahmed Alzubaidi, Mohammed Alyafeai, Hamza Alobeidli, Maitha Alhammadi, Shaikha Alsuwaidi, Omar Alkaabi, Basma El Amel Boussaha, Hakim Hacid
View original ↗Create an automated Arabic LLM evaluation pipeline that performs multi-model verification. This sets a new standard for localized language benchmarks.
Suggested repo: QIMMA-eval
"Finally, an Arabic benchmark that actually measures quality, not just scale."
Estimated effort: 60h