Eftychia Makri, Nikolaos Nakis, Laura Sisson, Gigi Minsky, Leandros Tassiulas, Vahid Satarifard, Nicholas A. Christakis
View original ↗Develop an evaluation suite for odor reasoning tasks to test LLM sensory grounding. This allows for benchmarking models on non-textual human sensory experience.
Suggested repo: olfact-eval
"Can your LLM actually smell? A benchmark for olfactory reasoning."
Estimated effort: 20h