Back to blogMarch 16, 2026

The GPU Scarcity Paradox

By Lucas Ewing

TL;DR

The GPU shortage isn't what you think. The industry doesn't have a supply problem — it has a utilization problem masquerading as one. We are extracting barely one-third of the power from the hardware we've already deployed.

The industry keeps buying GPUs because its dashboards make wasted capacity look like unavailable capacity. As software optimization catches up, operators will discover they have more compute than they realized — and the advantage will shift from who owns the most GPUs to who uses them best.

The Infrastructure Arms Race

Microsoft, Google, Meta, and Amazon are on track to spend nearly $700 billion on AI infrastructure in 2026.¹ Amazon alone committed $131 billion in CapEx in 2025, with expectations for $200 billion in 2026.² The hypothesis driving all this investment has remained the same since late 2023: GPU supply is structurally constrained, and if you don't secure capacity now, you will fall behind.

For a while, that thesis was hard to argue with. In 2023 and 2024, wait times for H100 GPUs stretched eight to twelve months on AWS and Azure.³ H100 cards traded on secondary markets for north of six figures — a 300% premium over NVIDIA's list price.⁴ Real constraints existed, and in some cases, capacity remains genuinely scarce.

But utilization data tells a different story. Anyscale reports that production AI workloads achieve well below 50% sustained GPU utilization, even under load.⁵ Fujitsu found that over 75% of organizations report GPU utilization below 70% at peak, dropping to 30% off-peak.⁶

GPU scarcity is real, but the market overstates it because it conflates "allocated" with "actually working." Once you measure what GPUs are really doing, a lot of the "shortage" turns out to be waste — waste that persists despite demand because nobody measures it and because buying more GPUs is easier than fully using the ones you have.

The Measurement Problem

Walk into any data center and ask about GPU utilization: they'll likely show you a dashboard reading 95%.

Most teams think their GPUs are fully utilized because they're looking at the wrong metric. Their dashboards show allocation — how many GPUs are claimed — not how much actual compute is happening per GPU. That gap is where the waste hides.

Anyscale's 2026 report confirms the pattern: most teams track GPU utilization through nvidia-smi or orchestrator-level dashboards rather than hardware profiling counters.⁵ The default monitoring path reports metrics that overstate utilization by 50–70 percentage points.⁵

This isn't a minor discrepancy. Every downstream decision — capacity planning, procurement, scaling — operates on data that conflates "allocated" with "working." Teams buy more GPUs because dashboards say they need more, when the real problem may be extracting value from what they have.

Fixing the measurement is the prerequisite to fixing everything else, because you can't optimize what you can't see.

Once teams correctly measure waste, it becomes visible: idle time between bursty training steps, overprovisioned "warm pools" kept alive to prevent inference cold-starts, and compute silicon starved of data by slow storage pipelines.

From Waste to Idle

Here's the twist: when you fix utilization, you don't eliminate waste — you move it.

Better utilization packs work onto fewer GPUs, which means the rest sit completely idle. Poor utilization and idle capacity have a direct, inverse relationship: fixing one creates the other.

Before and after GPU optimization: 8 GPUs at ~35% average utilization produce the same total compute as 3 GPUs at ~89% utilization with 5 fully idle — the waste doesn't disappear, it concentrates.

Even before you factor in optimization efforts, idle capacity is a persistent reality in almost every workload: between training runs, when inference traffic drops, and across seasonal demand troughs. GPU clusters are sized for peak demand, and peak demand is intermittent by nature.⁶

This idle capacity isn't just a technical inefficiency — it's a financial time bomb. Modern clouds are taking on debt to buy GPUs, but silicon is a rapidly depreciating asset. With new architectures arriving every 18 months, the revenue-per-GPU is in a race to the bottom.

If an operator cannot extract value from a chip during its short window of relevance, the debt will outlive the hardware.

In this market, "buying more" isn't a strategy — it's a default in slow motion.

The survivors will be the neo-clouds that leverage every ounce of compute available.

One could argue this is simply how infrastructure works — power plants, airlines, and telecom networks all carry reserve capacity. Idle hardware is expected and priced into the business model.

But GPU infrastructure differs in a critical way: the workloads are software-defined. An idle airplane can't dynamically serve a different route for 45 minutes. An idle GPU, however, can run a different workload if the orchestration exists to place it there safely and reclaim it quickly.

For operators who have already invested in the hardware, idle GPUs represent pure cost — power, cooling, depreciation — with zero return. Yet, as peak demand grows, leadership is left with the same decision: buy more.

The Orchestration Gap

The fiber optic bubble of 1996–2001 offers a useful parallel to GPU clusters.

Telecom companies invested over $500 billion laying 80 million miles of fiber across North America. By 2005, 85% of it was dark.⁷ The bust destroyed $2 trillion in market value. Yet the overbuild created the foundation for the modern internet.

The lesson wasn't that the investment was wasted. It was that value came not from laying the most fiber, but from software that could light it, route through it, and monetize dormant capacity.

AI infrastructure is at a comparable stage. According to McKinsey, building the forecasted data center capacity for 2030 will require trillions in CapEx.⁸ Whether that capital flows into raw silicon, advanced cooling, or real estate, the return on that massive physical investment is entirely dictated by the orchestration layer.

Right now, hundreds of billions flow to hardware, and a fraction funds the orchestration layer that determines whether that hardware produces value.⁹

The true winners of the fiber crash weren't the companies that dug the most trenches.

While giants holding the debt for millions of miles of "dark fiber" went bankrupt, companies like Equinix survived and went on to become the world's largest internet exchanges.¹⁰ Their secret? They realized raw physical cable was a low-margin race to the bottom.

Instead of competing on sheer volume, they built the interconnection layer — the routing and orchestration hubs that allowed fragmented, overbuilt networks to actually talk to each other. They turned static glass into a fluid, monetizable market.

Today, the broader industry is repeating the telecom mistake, treating "dark silicon" as the only competitive advantage. But the value is moving up the stack.

The survivors won't be the operators with the largest hardware fleets — they will be the ones who adopt the Equinix playbook. By integrating an intelligent orchestration layer, a data center can evolve from a depreciating hardware warehouse into a high-margin compute exchange.

More GPUs are needed. But the industry's most expensive problem is not the GPUs it cannot buy — it is the ones it already owns and cannot use.

This is Part 1 of a two-part series. In our upcoming technical deep-dive, we will break down exactly where this waste hides at the hardware level — and the engineering required to fix it.

Idle compute is the problem we're building Lilac to solve. We're building the orchestration layer that turns underutilized GPU clusters into revenue — identifying reclaimable capacity, scheduling external workloads into it, and returning resources the instant primary workloads need them. If you're running GPU infrastructure with idle capacity, we'd like to talk. If you want to run inference on that idle capacity, check our inference API pricing.

Sources

CNBC. "Google, Microsoft, Meta, and Amazon ramp AI cash." (Feb 6, 2026).
TechCrunch. "Amazon and Google are winning the AI CapEx race." (Feb 5, 2026).
LLM Utils. "NVIDIA H100 GPUs: Supply and Demand."
Jarvislabs. "H100 GPU Price Analysis."
Anyscale. "GPU In-Efficiency in AI Workloads." (Jan 21, 2026).
AI Infrastructure Alliance / ClearML. "The State of AI Infrastructure at Scale 2024."
Internet History. "Boom, Bubble, Bust: The Fiber Optic Mania."
McKinsey. "The cost of compute: A $7 trillion race to scale data centers."
Mordor Intelligence. "AI Infrastructure Market Analysis."
DrPeering. "How Equinix Beat MAE-East: IX Playbook Tactics."