Fine Tuning + Safety

16.0

Build a standalone Python library for measuring LLM calibration via 'verbalized confidence' and logit analysis. It should integrate seamlessly with HuggingFace models to provide an 'Uncertainty Score' for any generation.

+0

emergingimplementation gap

evaluationfine-tuningsafety

Signals (2)

mit ai15d ago

A better method for identifying overconfident large language models

arXiv16h ago

Fine Tuning + Safety

Signals (2)

A better method for identifying overconfident large language models

Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms