arXiv4d ago

DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery

Tianyu Liu, Sihan Jiang, Fan Zhang, Kunyang Sun, Teresa Head-Gordon, Hongyu Zhao

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty5/10

Categorytool

Topics

ragllmscience

Opportunity Brief

Build a structured evaluation framework (a leaderboard) specifically for chemical and drug discovery RAG pipelines. Provide standardized datasets that assess reasoning over molecular SMILES strings.

Suggested repo: drug-bench

"Finally, a real benchmark for LLM drug discovery performance."

Estimated effort: 40h