Yalun Wu, Haotian Liu, Zhoujun Li, Boyang Wang
View original ↗Develop the PilotBench environment in Python to allow for standardized evaluation of LLM-based aviation agents. This is a niche but high-stakes application of embodied AI.
Suggested repo: pilot-bench-gym
"Can your LLM fly a plane? Find out with this safety-critical benchmark."
Estimated effort: 50h