Xiaohan Ren, Chenxiao Fan, Wenyin Ma, Hongliang He, Chongming Gao, Xiaoyan Zhao, Fuli Feng
View original ↗Develop an open-source evaluation suite for medical reasoning that moves beyond multiple-choice to verify chains of thought. This provides the community a standard for clinical-grade AI.
Suggested repo: med-reason-bench
"Stop grading medical AI on trivia; grade them on clinical reasoning."
Estimated effort: 45h