Jon M Laurent, Albert Bou, Michael Pieler, Conor Igoe, Alex Andonian, Siddharth Narayanan, James Braza, Alexandros Sanchez Vassopoulos, Jacob L Steenwyk, Blake Lash, Andrew D White, Samuel G Rodriques
View original ↗Build a standardized framework that provides a gym-like environment for evaluating autonomous biology research agents. This should bridge the gap between abstract LLM benchmarks and actual physical/simulated lab protocols.
Suggested repo: lab-bench-env
"Stop testing agents with puzzles; test them with real scientific hypotheses."
Estimated effort: 80h