arXiv18h ago

Benchmarking Deflection and Hallucination in Large Vision-Language Models

Nicholas Moratelli, Christopher Davis, Leonardo F. R. Ribeiro, Bill Byrne, Gonzalo Iglesias

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

multimodalraghallucination

Opportunity Brief

Develop an automated evaluation framework for LVLMs that detects when the model should refuse a question rather than hallucinate. This is critical for reliable RAG pipelines.

Suggested repo: vigilant-vision

"Make your LVLM smart enough to say 'I don't know' when visual evidence is missing."

Estimated effort: 60h