My previous post was about AI and Bottlenecks in science and engineering. This post looks at a similar landscape from a different angle: where might scientific value (perceived and real) actually be created as AI climbs up the capability ladder?
Relative to the last post (and probably most posts in this blog), I will be much more speculative.
One point I want to mention: This is not the world as I want it, but this is the world as I see it.
1. A (very rough) ladder of capabilities
I think of the last decade of AI as walking up a ladder:
Correlations → Compression → [Unstructured] data synthesis → Language → Logic → Statistics → Math → Theoretical Physics → Applied Physics → beyond
We started with systems that could fit curves and compress signals and are now seeing early signs of models that can handle logic, do statistics, proofs, and start to be genuinely helpful in mathematical and physical reasoning.
It is not crazy to imagine AI systems that can:
-
Propose nontrivial lemmas, not just complete human-written ones.
-
Suggest closures, asymptotics, or reduced models in physics.
-
Construct usable surrogates and controllers.
What this will do is to lower the intellectual barrier to entry in every field. But at least for the next few years (I am not saying that it will change after that.. I just refuse to make a prediction beyond that) human intelligence /expertise will (obviously) still be a differentiator (just not as much)
In other words, the bar for participation will drop; the bar for meaningful contribution will rise.
For reference, here is a crude experiment on what an AI model could do in 09/24, and here is a more thorough study in 11/25.
2. When intelligence is cheap but reward is not
The scientific (and engineering) process will be accelerated and increasing chunks of it will be automated, but over the next few years, progress will be uneven, user/problem/domain-dependent, and so some of the bi-polar discussions around AI will continue.
In the last post, I split constraints into deep intelligence, routine labor, experiments/data, infrastructure, and verification/regulation/rewards. AI is clearly eating into the first two. But one theme keeps coming back:
Model capability is limited by the quality and availability of reward signals.
We have systems that are very good at optimizing whatever we can define and measure cheaply:
-
In games, the reward is explicit and fast.
-
In protein structure prediction, we had carefully curated data and clear objectives.
-
In short-term weather prediction, we have dense historical data and unambiguous forecast errors.
In many areas of science and engineering, the hard part is not running the optimizer; it is deciding what to optimize and more importantly, how to verify/measure it at scale. That means designing experiments, building instruments and simulations, encoding constraints, and wrestling with the messy problems where the metric fails but the real world still needs.
So while intelligence is becoming cheap on average, reward signal design, and the infrastructure needed to support it, are not. That’s where a growing share of scientific value will live. I think.
A good chunk of academic papers will still be published in the cheap and accurate reward signal regime, but we will cover that later in this post. For now, as an example, here is a paper of mine from a couple of years ago that took me more than 200 hours of derivation. I am proud of the work that I put it, but with modern tools, the time would be reduced to less than 25 hours. I fully expect this to shrink much further next year.
3. A sketch of the scientific value chain
A crude value chain might look like:
a) Framing: observing a phenomenon, deciding which questions matter, and connecting them to physical or technological stakes.
b) Representation and modeling: choosing variables, symmetries, scales, and approximations; deciding which complexity to ignore and which to keep.
c) Computation and experimentation: running simulations, building and operating instruments, generating data (including via AI-driven surrogates and robotic experimentation).
d) Verification and reward: deciding what counts as “good” or “correct,” building tests, benchmarks, and safety criteria, and feeding that back into training or design.
e) Deployment and scaling: pushing results into products, policies, workflows, and infrastructure, and surviving real-world constraints and failures.
Today’s AI systems mostly sit in stages (b) and (c): helping with modeling, coding, algebra, optimization, and data analysis. As capability pushes further along the ladder (into math, physics, and domain-specific reasoning), they will take larger bites out of those steps.
What remains harder to automate are the “edges” of the chain:
-
The front end, where problems are framed, abstractions are chosen, and vague questions are turned into tractable ones.
-
The back end, where outputs are embedded into reality and subjected to regulation, economics, and human preferences.
That is where human scientists and engineers will increasingly differentiate themselves. My national lab colleague Juston Moore calls it (in a different context) 'things that cannot be formalized easily'.
4. How scientists' value proposition might shift
If we accept that the intellectual barrier to entry is dropping, and that a lot of routine labor will be automated, what remains as a differentiator? Some examples:
a) Complexity and composition.
Ability to compose many hard/imperfect components into a robust system. Example: Handling multi-physics and multi-scale interactions where naive compositions fail. This is not the same skill as “solve this PDE.” It is more architectural. As examples, I look at capabilities that some of my U. Michigan, and National lab colleagues have (e.g. to predict inertial confinement fusion physics)... these are really hard, sometimes tied together by loose logic (not because the practicioners aren't rigorous, but because the problem is hard!!). AI isn't going to be writing and composing these kinds of solvers any time soon without extensive and long term interrogation by the scientist, coupled to feedback from reality.
b) Tools and instruments and the ability to generate data.
Owning the means of data and feedback generation becomes central.
-
A robotic lab or experimental platform that can generate high-quality, targeted data.
-
Unique simulation pipelines, possibly spanning fidelities, that can cheaply explore relevant regimes.
-
Shape the input distribution and reward landscape that future models will be trained on.
As generic models become widely available, these instruments and pipelines are the scarce capital.
c) Reward signal design and evaluation.
Defining what “good” means under messy and competing objectives: performance, stability, safety, cost, etc.
Turning that vector into training objectives, evaluation suites, and decision rules is itself a scientific and political act. People who can straddle domain science, statistics, and AI. I am less certain about this than the other points above.
All of this said, make no mistake.. AI will still help in the above, just that the differential value of the above may be higher than that of an elegant derivation or algorithm.
Another way to think about it.. value perceptions might shift from thinkers and enablers to problem solvers?
5. Publications and the continuing devaluation
This brings us to a most sensitive topic: publications.
If AI models can suggest arguments, fill in algebra, and clean up prose, summarize and cross-reference thousands of papers in seconds, generate decent “related work” and “limitations” sections, then the journal (or conference) paper as the primary unit of academic/scientific value is in trouble.
I am not suggesting that AI models do this really well right now, but they're getting better and better. For the record, the last couple of papers I have written, performance has been pedestrian overall. In one recent paper even the best frontier model kept getting stuck around a wrong narrative, but once I figured out the right formulation, the model was able to jump out of its local minima and produced value. In general, models are getting really good in-context as long as the chains aren't too long. This is a topic for a different day.
Add to this the fact that some AI models generate better peer review comments than perhaps the average reviewer even for top journals (and conferences). Partly a consequence of the fact that no one seems to have time these days. Sad, but true.
This doesn't mean that journals will vanish in the next few years. Rather:
a) The time and cost of producing papers goes down. In fact, this has been happening for decades now.
b) The signal-to-noise ratio in the literature gets worse. In fact, this has been happening for decades now.
c) The incremental informational value of “one more static document” drops as models learn to interpolate and generalize from the corpus.

Former U-M professor Samuel Goudsmit (best known for the concept of electron spin) founded Physical Review Letters and (given the spin paper was 1 page), was appalled at the growth in # of pages (to 4) and articles, and had the following** to say in 1970 or so.
So the formal value of publications will drift toward zero. What could potentially matter more are:
1. Reusable artifacts: datasets, benchmarks, codes, instruments, and models that others actually use.
2. Pipelines and deployments: systems that work outside toy problems and survive contact with messy reality.
3. Reputation for reliability: groups whose results, when reproduced and extended, tend to hold up.
- Some (not all) of the most respected academicians in most fields aren't known for #1 and #2.
- All of the above said, even under these circumstances, it is still valuable to educate students on the fundamentals, and train them to think clearly, and write good papers! Even if AI tools get a lot better, without good fundamentals, outcomes will be underwhelming. But we need good peer reviews for this system to function, and a proper educational setup. It will be a massive challenge.
6. Fluid dynamics, complex physics, and “beyond”
It is tempting to look at the final steps on the ladder
Math → Theoretical Physics → Fluid Dynamics, etc. → beyond
and imagine a clean sequence: once AI “solves” math and theory, difficult application domains fall like dominos.
Reality is messier. Domains like turbulence, MHD, plasma physics, climate, and complex materials are not just about solving well-posed problems:
-
The governing equations are often known, but intractable, the relevant regimes are high-dimensional, multi-scale, and data-poor.
-
Much of the action is in modeling choices, closure assumptions, and coupling to surrounding systems (geometry, chemistry, control).
-
Verification and Validation is hard
This is why these problems have been around for decades. Not because of a lack of "intelligence", although more intelligence and automation will clearly accelerate progress.
So what can be done? Advances in Computational Science and AI will help, but the hardest parts of the scientific value chain in these domains -choosing the right questions, connecting to reality, and ensuring robustness- will still be challenging.
“Beyond” here doesn’t mean “beyond physics.” It means beyond what can be cleanly benchmarked and optimized with existing reward signals. That frontier is where a lot of the interesting work will be. Below is one example of that.
I have a lot more to say, and tempted to add many more if's and but's and post scripts, but this is a blog post, so I will stop here.

Comments
Post a Comment