Why the AI boom is about to hit a wall
The Reality of AI Capacity Constraints
Recent earnings calls from major tech companies have highlighted a critical shift in the technology sector: despite massive capital expenditures, the world’s most valuable software companies remain deeply capacity constrained. For example, despite a $190 billion capital expenditure plan, Microsoft is still unable to manufacture enough chips packaged with the necessary memory to meet its own demand.
When hyperscalers mention being “capacity constrained,” it does not mean they have simply run out of GPUs. The supply problem exists a layer deeper than the graphics card. It is an industrial constraint centered around whether manufacturers can physically build and package enough chips with the high-bandwidth memory required to sustain modern AI workloads.
AI vendor contracts are no longer traditional software agreements; they are essentially industrial supply contracts. An AI vendor contract is effectively tied into hyperscaler capacity and requires strict allocation terms, capacity guarantees, and fallback plans.
Moving Beyond the “Software” Abstraction
A core issue in AI procurement is that buyers still treat AI as a traditional software product with a complex backend. While tools like ChatGPT, Copilot, Gemini, and Claude look like standard applications, the infrastructure supporting them is entirely different.
Every generated token is the output of a physical production system—an “AI factory.” This factory involves:
- Silicon & Logic: The chips running the mathematical operations.
- High-Bandwidth Memory (HBM): The components feeding data to the chips.
- Packaging: The technology integrating chips and memory together.
- Networking & Optics: The systems moving data across the compute cluster.
- Power & Cooling: The resources keeping the hardware online and at functional temperatures.
- Real Estate & Construction: The physical land and development of the data center.
- Operations Talent: The personnel required to keep the factory highly utilized.
The traditional “cloud era” abstraction—where infrastructure is invisible and compute is infinitely elastic—is broken. AI intelligence requires a physical bill of materials, forcing tech giants to operate as heavy industrial infrastructure companies. Consequently, meta platforms, Amazon, and Google are committing hundreds of billions of dollars collectively to hardware and data center construction.
Bottlenecks in the AI Supply Chain
Understanding where the AI supply chain breaks down is critical for modern procurement and engineering teams. A bottleneck at any of the following layers can prevent an AI vendor from delivering intelligence:
The Compute Module
A single chip does not produce intelligence at scale. The baseline unit of infrastructure is the module—such as Nvidia’s GB200 NVL72. This is a liquid-cooled, rack-scale system connecting 72 GPUs and 36 CPUs into a single domain, utilizing 13.5 terabytes of high-bandwidth memory.
Memory and Packaging
High-Bandwidth Memory (HBM) is the single most constrained input in the AI supply chain. Without sufficient data transfer speeds, compute capacity sits idle. Furthermore, integrating logic dies and HBM into a single functional chip requires advanced packaging (like TSMC’s CoWoS).
- Data Point: Industry estimates suggest that the four largest AI chip designers consume roughly 90% of global chip packaging capacity and 90% of HBM supply, despite consuming only 12% of advanced logic die production. The bottleneck is not chip design; it is memory and assembly.
Networking and Optics
Large AI clusters are communication machines. GPUs must move massive amounts of data continuously. At the scale of hundreds of thousands of GPUs, traditional copper connections reach their physical limits regarding heat, distance, and signal integrity, making advanced optical networking mandatory.
Power, Cooling, and Construction
The physical realities of data centers frequently limit deployment:
- Power: Global data center electricity consumption is projected to roughly double to 945 terawatt-hours by 2030. The true constraint is securing firm power at a specific location on a specific schedule.
- Cooling: Dense AI racks generate heat that legacy data centers cannot handle, making liquid cooling a mandatory part of production capacity.
- Construction Timelines: Traditional 12-to-18-month construction estimates do not apply to 500-megawatt AI campuses. Land acquisition, power transmission, and interconnection can stretch timelines to up to four years.
The Economics of AI Compute
AI introduces a much tougher capital cycle than traditional software. GPU hardware depreciates on a 3-to-5-year timeline, while the physical data center shells last much longer. In many cases, old data centers cannot be retrofitted for next-generation hardware.
This financial reality makes token utilization the central operating metric. An AI factory with low utilization is a financial liability because the hardware depreciation clock runs regardless of whether tokens are served.
Efficiency Gains vs. Jevons Paradox
Serving costs are falling. Advances in smaller models, distillation, caching, batching, quantization, and speculative decoding are increasing the throughput hardware can deliver. (For example, Microsoft recently increased Copilot inference throughput by 40% through software and hardware optimization).
However, cheaper and more efficient tokens generate more demand (Jevons Paradox). Longer context windows, complex agent loops, and automated retries mean that token consumption grows exponentially as models become more capable.
Redefining the AI Vendor Contract
To deploy AI natively, teams must abandon legacy assumptions about software procurement and adjust how they manage their AI infrastructure.
Rethinking Demand Forecasting
Forecasting AI adoption by “seats,” “users,” or “licenses” will inevitably lead to under-budgeting capacity or overpaying for the wrong infrastructure. Forecasting must be conducted at the workload level:
- Tokens per workflow
- Context length requirements
- Model calls per task
- Agent concurrency and continuous loops
- Latency tiers, failure rates, and retry rates
An occasional coding assistant query consumes vastly fewer resources than an autonomous agent looping for days to read repositories, write code, and run tests.
Three Crucial Questions for AI Investment Reviews
When negotiating AI contracts and evaluating vendors, leadership and procurement teams must address the physical constraints of the AI supply chain. Bring the following three questions to any AI investment review:
- What share of the vendor spend guarantees reserved capacity versus a “best efforts” allocation?
- You must establish a concrete plan for allocation tiers. If the default provider becomes supply-constrained for a month, having a “great relationship” with the vendor will not keep your applications online.
- What is the specific routing plan for utilizing cheaper models?
- Many companies waste margins by running expensive, complex models against simple tasks. A routing layer must be established to send simpler tasks to cheaper models, alongside a method to measure cost savings without degrading the user experience.
- Where is hidden human supervision masking product failure?
- In top AI workflows and vendor demos, humans often subtly guide the AI or correct failures. If this hidden supervision is required for the tool to function, you will struggle to scale it, price it, or fully automate it in production.
Summary: The New Executive Mandate
The transition to an intelligence economy forces software buyers and engineering leaders to think like industrial operators. When you purchase AI software, you are effectively buying a share in an industrial factory’s token output. Navigating this successfully requires diligent management of supply assurance, throughput optimization, capacity scheduling, and utilization.
Meta
Added: 2026-05-24