Adam Zewe | MIT News
View original ↗Build a standalone Python library for measuring LLM calibration via 'verbalized confidence' and logit analysis. It should integrate seamlessly with HuggingFace models to provide an 'Uncertainty Score' for any generation.
Suggested repo: caliLLM
"Stop trusting hallucinating models—measure their certainty."
Estimated effort: 60h