Zonghuan Xu, Xiang Zheng, Yutao Wu, Xingjun Ma
View original ↗Create an open-source library that facilitates human-in-the-loop (HITL) calibration for LLM-based risk evaluators. This helps developers replace pure LLM judges with hybrid human-AI scoring pipelines.
Suggested repo: truth-gauge
"Your LLM judge is biased—calibrate its risk assessments with real human feedback."
Estimated effort: 30h