Yang Liu, Hongming Li, Melissa Xiaohui Qin, Qiankun Liu, Chao Huang
View original ↗Build an open-source evaluation framework that allows developers to run the SemanticQA benchmark against any local LLM via an API. This helps developers quantify how well their fine-tuned models handle idiomatic expressions and complex noun compounds.
Suggested repo: semantic-eval
"Stop guessing if your model understands idioms—test it with SemanticQA."
Estimated effort: 20h