Less than 7 months ago, I wrote a post about AI and Math.. pulling some statements from there**.
#1 "..going from LLMs parroting Wikipedia (and failing to add two simple numbers accurately) to Olympiad medal-level performance in two years is simply astounding."
#2 "I'd venture to say that less than 0.1% of PhDs will be able to solve even one problem. "
#3 "When the scoring is trustworthy and the proof feedback is good, it can accelerate discovery, document failures, and occasionally perhaps surface genuinely new structures..."
-----
Since then, a lot has happened in AI for Math (many years of progress if normalized by how science usually moves), but I didn't say anything because I was waiting for a really big one. Well.. recently, one of OpenAI's models cracked a hard Erdos problem. The Wall Street Journal has an unusually good article on this, so I'll just quote from it.. in blue, so you can ignore my commentary
With regard to #1 above “Forget one year ago,” this would be impossible “A month ago.”
I've always said.. pay attention to the speed of progress *and* I've recently said that GPT-5.5-pro and Opus 4.7 are scary good. In fact, the jump from GPT-5 to GPT-5.2 was huge! and 5.2 to 5.5 was even bigger. This is not a plateau...
With regard to #2, well.... from 0.1% of PhDs, we can say 0% of PhDs.. and it only cost $1000.
"If a human had written the paper and submitted it to the Annals of Mathematics and I had been asked for a quick opinion, I would have recommended acceptance without any hesitation. No previous AI-generated proof has come close to that.” — Timothy Gowers, Fields Medalist
With regard to #3, time to acknowledge that AI models are better than us at some aspects of science:
"While mathematicians tend to focus on their specific areas of expertise, AI models use their vast knowledge to spot connections that we couldn’t possibly see ourselves. In this case, that meant pulling from both algebraic number theory and discrete geometry, which have about as much in common as the marathon and pole vault.
#3 "When the scoring is trustworthy and the proof feedback is good, it can accelerate discovery, document failures, and occasionally perhaps surface genuinely new structures..."
-----
Since then, a lot has happened in AI for Math (many years of progress if normalized by how science usually moves), but I didn't say anything because I was waiting for a really big one. Well.. recently, one of OpenAI's models cracked a hard Erdos problem. The Wall Street Journal has an unusually good article on this, so I'll just quote from it.. in blue, so you can ignore my commentary
With regard to #1 above “Forget one year ago,” this would be impossible “A month ago.”
I've always said.. pay attention to the speed of progress *and* I've recently said that GPT-5.5-pro and Opus 4.7 are scary good. In fact, the jump from GPT-5 to GPT-5.2 was huge! and 5.2 to 5.5 was even bigger. This is not a plateau...
With regard to #2, well.... from 0.1% of PhDs, we can say 0% of PhDs.. and it only cost $1000.
"If a human had written the paper and submitted it to the Annals of Mathematics and I had been asked for a quick opinion, I would have recommended acceptance without any hesitation. No previous AI-generated proof has come close to that.” — Timothy Gowers, Fields Medalist
With regard to #3, time to acknowledge that AI models are better than us at some aspects of science:
"While mathematicians tend to focus on their specific areas of expertise, AI models use their vast knowledge to spot connections that we couldn’t possibly see ourselves. In this case, that meant pulling from both algebraic number theory and discrete geometry, which have about as much in common as the marathon and pole vault.
“It’s the kind of idea that you try for a bit, it doesn’t work, and you think maybe you were just too hopeful,” said Mark Sellke, a Harvard statistician at OpenAI. “So you give up and move on.”
AI doesn’t move on. It keeps plugging away without taking breaks to eat, sleep, answer emails, pick the kids up from school and watch the Knicks."
AI doesn’t move on. It keeps plugging away without taking breaks to eat, sleep, answer emails, pick the kids up from school and watch the Knicks."
One more point.. this is even news only because benchmarks like the Erdos problems exist. Math is of course the field with the best benchmarks. Science is harder. But we (and other groups) are trying... watch this space.
===
** This is not for an ego boost. One of the main reasons this blog exists is to highlight how quickly things are evolving. So this is a benchmarking exercise in that regard.