Tianyu Liu, Sihan Jiang, Fan Zhang, Kunyang Sun, Teresa Head-Gordon, Hongyu Zhao
View original ↗Build a structured evaluation framework (a leaderboard) specifically for chemical and drug discovery RAG pipelines. Provide standardized datasets that assess reasoning over molecular SMILES strings.
Suggested repo: drug-bench
"Finally, a real benchmark for LLM drug discovery performance."
Estimated effort: 40h