arXiv17h ago

PilotBench: A Benchmark for General Aviation Agents with Safety Constraints

Yalun Wu, Haotian Liu, Zhoujun Li, Boyang Wang

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

agentsroboticsevaluation

Opportunity Brief

Develop the PilotBench environment in Python to allow for standardized evaluation of LLM-based aviation agents. This is a niche but high-stakes application of embodied AI.

Suggested repo: pilot-bench-gym

"Can your LLM fly a plane? Find out with this safety-critical benchmark."

Estimated effort: 50h