Congning Ni, Sarvech Qadir, Bryan Steitz, Mihir Sachin Vaidya, Qingyuan Song, Lantian Xia, Shelagh Mulvaney, Siru Liu, Hyeyoung Ryu, Leah Hecht, Amy Bucher, Christopher Symons, Laurie Novak, Susannah L. Rose, Murat Kantarcioglu, Bradley Malin, Zhijun Yin
View original ↗Build a prompt-testing toolkit that validates LLM outputs against the UTCO (User, Topic, Context, Tone) framework. Developers should create an evaluation harness that specifically benchmarks model performance on high-distress mental health narratives.
Suggested repo: utco-eval
"Stop LLMs from giving unsafe mental health advice with the UTCO validation framework."
Estimated effort: 20h