Build a standalone Python library for measuring LLM calibration via 'verbalized confidence' and logit analysis. It should integrate seamlessly with HuggingFace models to provide an 'Uncertainty Score' for any generation.