Develop an evaluation suite for odor reasoning tasks to test LLM sensory grounding. This allows for benchmarking models on non-textual human sensory experience.