arXiv9h ago

Measuring Representation Robustness in Large Language Models for Geometry

Vedant Jawandhia, Yash Sinha, Murari Mandal, Ankan Pal, Dhruv Kumar

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

reasoningevaluationgeometry

Opportunity Brief

Develop a robustness benchmark tool that tests if an LLM solves math problems regardless of how the problem is expressed. This will expose fragility in current reasoning benchmarks.

Suggested repo: georep-eval

"Is your math model reasoning, or just pattern matching?"

Estimated effort: 35h