Ode to relentless engineering : Orders of magnitude improvement

Thomas Zacharia's (SVP @ AMD, Ex-Director of Oak Ridge National Lab) talk at TPC26 was excellent and he stressed was that we shouldn't forget lessons from HPC when thinking about AI.

One of the things he said was "Exascale for 20MW sounded ridiculous."

Even 15 years ago, if you took the machines of that era and extrapolated naively, an exascale computer looked like it would need greater than 1 GW to power it. In fact, when I taught my first CFD class at U. Michigan (2014) I used to show a picture of the Hoover Dam to show what it would take for Exascale (and make the case for why we need better models and numerical methods).

It wasn't just me. From this article from 2008: Jaguar uses 7 megawatts of power, but an exascale system that uses CPU processing cores alone could take 2 gigawatts, says IBM's Dave Turek. "That's roughly the size of medium-sized nuclear power plant," he says. "That's an untenable proposition for the future." Finding a way to reduce power consumption is key to developing an exascale computer.

Today's exascale scientific computing machines come in around 20 MW. A factor of 100 lower.

I find that this kind of thinking is hard for people (and scientists) to internalize. In other words, we are not very good at imagining engineering.

[As always.. click on the images to see them clearly. I don't know why this site displays the previews so poorly]

I don’t mean generic techno-optimism. Not magic breakthroughs. I mean the relentless, system-level work of changing the constants in front of the scaling laws. Better chips, memory hierarchies, interconnects, cooling, compilers... Better co-design (which is what Thomas Zacharia highlighted) between applications and hardware.

I feel the heroic efforts of DOE engineering & scientists in delivering (and helping companies) big machines is worth of a NYTimes bestseller. I told Thomas that he should write a book. Topic for another day.

Cost of intelligence
It is perhaps true that total demand for AI may rise enormously. Grid constraints are real and already choking buildout. Water, land, transmission, chips, supply chains... all real. No serious person should hand-wave them away. Anything to increase GDP costs energy & resources ($1 on energy = $4 of GDP)

But the cost of a “unit of intelligence” is collapsing. Stanford’s AI Index reports that the cost of querying a model fell from about $20 per million tokens in late 2022 to about 7 cents by late 2024 (equivalent tokens... in the meantime intelligence capabilities have increased by a lot). Epoch AI and others find similar directions, though the exact rate depends on the capability being measured.

No one knows how much further this will go. what is the trajectory of cost per useful task? What is the energy per useful answer? What is the cost per verified reasoning step? What is the marginal power draw of a million agent-hours? What are the algorithmic efficiency gains? What happens when models are distilled, specialized, cached, sparsified, quantized, compiled, routed, and embedded in workflows that do not call the giant model for every trivial step?

Beyond Computing
This pattern is not unique to computing. Lithium-ion batteries are are great example. In the early 1990s, lithium-ion cells cost thousands of dollars per kWh. By 2024, cell prices were below $100/kWh. Depending on whether one looks at cells or packs, and which dates one uses, the decline is roughly one to two orders of magnitude. EV battery pack costs have fallen about 90% from 2008 to 2023.

Solar is similar. Utility-scale solar went from absurdly expensive to one of the cheapest sources of new electricity in many places. NREL’s benchmark for U.S. utility-scale PV LCOE fell from about 35 cents/kWh in 2010 to under 5 cents/kWh in 2024. Just engineering.

So back to the topic..
Intelligence could become abundant, embedded, and increasingly efficient. Verification could become the bottleneck. If there is a lesson from exascale, batteries, solar, and now AI inference, it is that engineering is underestimated. It assumes the system will scale by becoming a bigger version of itself. It misses the redesign. AI training and inference will not become cheap only because computing gets more efficient. It will become cheap because the entire stack will get much more efficient.

The high level OOM plot from Leopold Aschenbrenner is quite relevant here. It did sound more ridiculous than 'Exascale at 20MW' in June 2024, but here we are... We live in incredibly interesting times.

---------------------------
Now... there is certainly some sampling bias in the 3 other examples I picked. There have been promising technologies that never scaled, but the counter to this is

a) Efficiency and capability has improved by 2 orders of magnitude in 2-3 years!
b) We're still so early

Computation, AI, Science... and just about everything else

Search This Blog

Ode to relentless engineering : Orders of magnitude improvement