Suhaas Garre, Erik Knutsen, Sushant Mehta, Edwin Chen
View original ↗Create an open-source evaluation suite for 'Moonshot Mathematics' to test models beyond standard olympiad problems. Focus on complex, multi-layered problem sets.
Suggested repo: riemann-eval
"The ultimate stress test for advanced mathematical reasoning."
Estimated effort: 40h