The final few feet: It is the scientific interaction that really matters (right now at least).

Matthew Schwartz's blog "Vibe Physics" (don't be put off by the title) excites me because it is the clearest writeup of how AI models can impact theoretical science. The tipping point is here, but the mode of interaction is something AI Technologists, or the general public (or many scientists) are not paying a lot of attention to. 

Schwartz, a Harvard professor of physics and author of a well-referenced book on Quantum Field Theory  guided Claude4.5 (also used GPT5.2 to cross-check and get some results) through a real theoretical physics problem from start to finish, "without touching a file" himself. A paper that contains a new factorization theorem. This is not a toy demo... and I confirmed it with my Physics colleagues.

He says
"For this project (which I completed with Claude in two weeks), I’d estimate that it would have taken me and a grad student student 1-2 years, and me without AI around 3-5 months. Ultimately, it accelerated my own research tenfold. That’s game-changing!"

 Matthew did not build an autonomous agent or set up a pipeline. He sat with the model, directed it, caught it when it got it wrong, pushed it when it took shortcuts, and steered it through a landscape that only he could navigate. The result appears to be a rigorous contribution to quantum field theory. The method is a professor doing what professors do: "advising, checking, exercising taste" except the "student" works at 3 AM without complaint and iterates 110 drafts. 

I recently asked Jesse Thaler (MIT Physicist, and PI of IAIFI which Matthew is also part of) what he can do if he deployed 10,000 agents simultaneously (one of my Michigan colleagues does this), and he said his primary mode of interaction is also pretty much at the chatbot level! Doesn't mean that's the best approach currently, but it also doesn't mean 10k agents is a great idea right now, for this class of problems.

This distinction matters enormously, and I think it gets lost in the current discourse. Over the past year, we've seen a parade of "AI scientist" announcements. Matthew seems politely skeptical of these. I am not as negative because the nature of the problem he tackles is special, but there is a key message: 

I wrote recently about the idea that AI brings anyone within X feet of any problem. Matthew's post is one of the cleanest illustrations of this. Claude understands the setting, does the integrals, write the code interfaces, generate the Monte Carlo events, produces the LaTeX file, synthesizes the literature. All of that gets to within X feet. The last X feet took Claude + hours of his own expert verification, and there is no scaffolding or agent that substitutes for it. Yet. In his problem.

There is also a message for groups such as Firstproof. It is an admirable effort to put together really hard problems and benchmarking autonomous AI. But note that an expert with good sense can still get a great deal of acceleration by gently guiding AI in an almost (but not quite) autonomous way. 

In the AjoGI post, I argued that we should focus on what these models do, operationally, rather than getting hung up over what they "are." What they do, right now, is amplify a domain expert's reach by a lot, if used well. The jaggedness is real.

Matthew describes running four or five projects in parallel, moving between windows, checking output, sending prompts. This is the Golden Age of the PI I wrote about: An expert with taste, directing a tireless, enormously capable, sometimes lazy/sloppy tool, covering ground that would have taken years. That's not autonomy. I've felt this strongly: agents that can track my frequently interrupted line of thinking and provide continuity.

There's the obvious hard question underneath all of this, and Matthew raises it too: what happens to grad students? Also, if the "G2-level" work is automated, how do students build the intuition? I keep coming back to the friction question in my previous post: "The struggle of deriving something by hand, debugging line by line, reading a paper three times—that's how intuition is built. Remove it entirely and you get people who can't cover the last 100 feet. Keep all of it and you get people who are 10x* less productive than they could be. The balance is everything and we don't know where it is."

We need to figure out which friction to preserve, and that's an institutional design problem as much as a pedagogical one. Well.. we're talking about some related aspects tomorrow.

Matthew says this may be the most important paper he's ever written because of the method.   These tools are already at the point of transforming ** greatly accelerating theoretical sciences and math***, and the cognitive part of all of science. 


* Funny that within a week of that post, Matthew says his work was accelerated 10X !
** Transformation in most problems requires blowing past bottlenecks that aren't just cognitive.
*** These are fields where the reward signal can be efficiently generated