Rapid progress in AI & Math : Pay attention to 3 things.

Less than 7 months ago, I wrote a post about AI and Math.. pulling some statements from there**.

#1 "..going from LLMs parroting Wikipedia (and failing to add two simple numbers accurately) to Olympiad medal-level performance in two years is simply astounding."

#2 "I'd venture to say that less than 0.1% of PhDs will be able to solve even one problem. "

#3 "When the scoring is trustworthy and the proof feedback is good, it can accelerate discovery, document failures, and occasionally perhaps surface genuinely new structures..."
-----
Since then, a lot has happened in AI for Math (many years of progress if normalized by how science usually moves), but I didn't say anything because I was waiting for a really big one. Well.. recently, one of OpenAI's models cracked a hard Erdos problem. The Wall Street Journal has an unusually good article on this, so I'll just quote from it.. in blue, so you can ignore my commentary

With regard to #1 above “Forget one year ago,” this would be impossible “A month ago.”
I've always said.. pay attention to the speed of progress *and* I've recently said that GPT-5.5-pro and Opus 4.7 are scary good. In fact, the jump from GPT-5 to GPT-5.2 was huge! and 5.2 to 5.5 was even bigger. This is not a plateau...

With regard to #2, well.... from 0.1% of PhDs, we can say 0% of PhDs.. and it only cost only a few $1000.
"If a human had written the paper and submitted it to the Annals of Mathematics and I had been asked for a quick opinion, I would have recommended acceptance without any hesitation. No previous AI-generated proof has come close to that.” — Timothy Gowers, Fields Medalist

With regard to #3, time to acknowledge that AI models are better than us at some aspects of science:
"While mathematicians tend to focus on their specific areas of expertise, AI models use their vast knowledge to spot connections that we couldn’t possibly see ourselves. In this case, that meant pulling from both algebraic number theory and discrete geometry, which have about as much in common as the marathon and pole vault.

“It’s the kind of idea that you try for a bit, it doesn’t work, and you think maybe you were just too hopeful,” said Mark Sellke, a Harvard statistician at OpenAI. “So you give up and move on.”

AI doesn’t move on. It keeps plugging away without taking breaks to eat, sleep, answer emails, pick the kids up from school and watch the Knicks."

One more point.. this is even news only because benchmarks like the Erdos problems exist. Math is of course the field with the best benchmarks. Science is harder. But we (and other groups) are trying... watch this space.

Edit on June 12: Here is the latest on Frontier Math Tier 4, which is perhaps the most rigorous benchmark we have. It is also mostly blind (companies don't have access to the answers... at least Anthropic doesn't, and they crushed the test). Tier 4 consists of research-level problems formulated by mathematicians, and I'd guess 99% of mathematicians won't be able to solve it (and those who can, would take months)

===
** This is not for an ego boost. One of the main reasons this blog exists is to highlight how quickly things are evolving. So this is a benchmarking exercise in that regard.

Computation, AI, Science... and just about everything else

Search This Blog

Rapid progress in AI & Math : Pay attention to 3 things.