Why Your Projections About Global Compute Are Totally Wrong

Silicon Valley has a math problem, and it's costing trillions of dollars. If you listen to the leading tech executives or look at the staggering capital expenditure charts from Goldman Sachs, you'll hear a simple narrative. We need more data centers. We need millions of specialized chips. We need enough electricity to power entire nations just to keep artificial intelligence growing.

Goldman Sachs projects a staggering $7.6 trillion in cumulative capital investment between 2026 and 2031 across compute, data centers, and power grid infrastructure. Big tech firms are buying up old nuclear plants and trying to secure gas turbines that are already backordered through 2029. Everyone assumes the demand for computational power is a bottomless pit.

They're missing the point. The question isn't just about how many gigawatts we can squeeze out of the electric grid before it snaps. The real issue is that our current approach to calculating global compute needs relies on outdated assumptions about how software works. We are building an absurdly oversized infrastructure based on inefficient code, and the market is about to hit a wall.

The Inversion of Training and Inference

For the last few years, the tech industry focused entirely on training. Building a frontier model meant gathering massive datasets, buying 100,000 Nvidia H100 GPUs, and running them at full blast for months. The compute required for these training runs has grown by a massive five times per year since 2020.

That era is ending. Training demand is slowing down because we're running out of high-quality human text to feed these systems.

The real shift in 2026 is inference. This means running the models after they're built. Deloitte projects that inference workloads will claim roughly two-thirds of all AI compute this year. That is a total flip from the early days of generative AI.

But here is where the math gets messy. We aren't just running simple queries anymore. Newer reasoning models use what researchers call test-time compute. When you ask an advanced model a tough programming or math question, it doesn't just spit out the first word it calculates. It stops. It thinks. It runs internal loops, corrects its own mistakes, and evaluates different pathways before giving you an answer.

This requires a massive amount of computational power per query. Asking a model to spend fifteen minutes thinking through a complex scientific proof uses over 100 times the compute of a simple email summary. If every search engine query turned into a deep reasoning session, the global energy grid would collapse by next Tuesday.

We don't need a blanket increase in global compute for everything. We need to understand when to use heavy computation and when to turn it off. Most daily tasks don't require an artificial brain running at maximum capacity.

The Efficiency Myth and Algorithmic Breakthroughs

Hardware companies love to talk about chip efficiency. They point out that performance per dollar improves by about 37% each year. They claim that newer architectures will naturally solve the energy crisis.

Hardware won't save us. The growth rate for AI compute demand is running at more than double the speed of Moore's Law. Silicon optimization cannot keep up with an exponential curve that doubles every few months.

The real savior is smarter math. Algorithm efficiency is the ignored variable in the global infrastructure equation.

✨ Don't miss: The Invisible Digital Scaffold Building Hong Kong’s Northern Link

Look at what happened with model architectures recently. The research group Epoch AI found that while compute stocks are growing rapidly, pre-training algorithmic efficiency improves by roughly three times per year. This means researchers can achieve the exact same model performance using a fraction of the raw hardware power compared to a year ago.

We saw this clearly with the rise of sparse architectures like DeepSeek V3. Instead of activating every single parameter in a massive model for every single word, sparse models only trigger a tiny fraction of the network. It cuts the computational cost down to a sliver of what traditional dense models require.

When you make software three times more efficient every twelve months, your long-term demand for physical factories changes completely. The massive data centers being planned right now might be obsolete before the concrete finishes drying because the software became too smart to need them.

The Physical Walls of the AI Factory

Even if tech companies want to spend trillions of dollars on endless clusters of silicon, the physical world has other plans. You can't download a power plant. You can't write code to speed up the manufacturing of an industrial electrical transformer.

The constraints are hitting the entire supply chain simultaneously. TSMC’s advanced manufacturing nodes are completely booked out through 2027. High-bandwidth memory is in short supply, which has caused ordinary computer memory prices to jump significantly since early 2025.

Then there is the power grid. McKinsey estimates that AI workloads will demand 156 gigawatts of data center capacity globally by 2030. In places like Northern Virginia, the wait time to connect a new data center to the regional electric grid has stretched between four and seven years.

👉 See also: The Digital Knot Beijing Cannot Simply Cut

This infrastructure lag is forcing companies into bizarre compromises. Tech giants are desperate. They are buying out old crypto mining facilities just to steal their grid connections. They are trying to secure gas turbines from manufacturers like GE Vernova and Siemens Energy, only to find out that production lines are sold out for years.

This physical bottleneck will force the industry to innovate on the software side. When you literally cannot buy more electricity, your only choice is to make your models smaller, faster, and cheaper. The assumption that global compute will grow indefinitely assumes that our physical infrastructure can scale like software. It can't.

How to Optimize Your Tech Stack Today

Stop waiting for tech monopolies to build massive data centers that will magically solve your computing costs. If you run a business or build applications, you need to optimize for the compute realities of today.

First, audit your model usage. Stop using frontier reasoning models for basic text classification or simple customer service routing. Implement a routing layer that sends simple tasks to tiny, open-source models that run efficiently on cheap hardware.

Second, embrace local inference where it makes sense. The gap between proprietary cloud models and open-source software has closed significantly. Running smaller models directly on user devices or localized enterprise servers reduces your dependence on the strained public cloud infrastructure.

Third, demand transparency from your vendors regarding test-time compute costs. If a model is running extensive internal reasoning chains for a simple query, it is wasting your money and wasting global power. Set strict limits on token budgets and execution time.

The future doesn't belong to the companies that burn the most electricity. It belongs to the ones who get the most answers out of every single watt. Turn down the raw power and turn up the optimization. Everything else is just expensive noise.