New math benchmark reveals AI models confidently solve problems

AIThe Decoder1h ago

New math benchmark reveals AI models confidently solve problems

New math benchmark reveals AI models confidently solve problems that have no solution

A consortium of 64 mathematicians built SOOHAK, a new AI benchmark with 439 handwritten tasks, including 99 that are deliberately unsolvable. Google's Gemini 3 Pro leads on research-level problems at 30 percent. But no model cracks 50 percent on spotting broken tasks. More…

Read full article

Source: The Decoder · Opens in new tab

Share on X Share on LinkedIn