The video warns that AI inference costs are set to spike 2-3x due to a severe shortage of critical compute resources like GPUs and high-bandwidth memory, as demand from AI agents and enterprises far outpaces supply. It urges companies to act quickly to secure compute capacity and optimize efficiency, or risk being outcompeted and left behind in the rapidly evolving AI-driven economy.
The video discusses an emerging crisis in global technology infrastructure driven by the exponential growth in AI demand and a severe shortage of compute resources, particularly for AI inference. Over the past three years, the world economy has rapidly reorganized around AI capabilities, making AI infrastructure the largest capital expenditure project in history. However, the supply of critical components like high-bandwidth memory (HBM), DRAM, and advanced GPUs is physically constrained, with no significant relief expected before 2028. Major tech companies and hyperscalers such as Google, Microsoft, Amazon, and Meta have already locked up most of the available compute capacity for years, leaving enterprises to compete for the remaining scraps.
Demand for AI compute is growing at an uncapped, exponential rate, driven by increased per-worker usage and the proliferation of agentic systems—AI agents that can operate autonomously and continuously. Unlike human users, these agents have no natural rate limits and can consume vast amounts of compute resources, leading to a potential 10x to 100x increase in enterprise AI consumption. For example, a 10,000-person organization could see its annual AI inference costs balloon from $20 million to $2 billion within 18 months if current trends continue, especially as agentic workflows become more prevalent.
The supply side is fundamentally broken due to structural bottlenecks in memory and semiconductor manufacturing. The three major memory producers—Samsung, SK Hynix, and Micron—are shifting production to enterprise and AI data center segments, but new fabrication facilities take years and billions of dollars to build. High-bandwidth memory is sold out, and advanced chips from TSMC are fully allocated to hyperscalers and a few major customers. Nvidia, which dominates the AI GPU market, has its H100 and Blackwell GPUs sold out for years in advance, and alternatives from AMD and Intel are either less mature or similarly constrained.
This supply-demand imbalance is expected to cause a sharp spike in AI inference costs, with memory and GPU prices projected to double or triple over the next 18 months. Traditional enterprise IT planning frameworks—based on predictable demand, stable technology, and available supply—are now obsolete. Enterprises that fail to secure compute capacity soon will face escalating costs, unreliable access, and the risk of being outcompeted by those who act quickly. The hyperscalers, who also compete directly with their enterprise customers, are incentivized to prioritize their own AI products over selling excess capacity.
To navigate this crisis, the video recommends several strategic actions for enterprise leaders: secure compute capacity now with contractual guarantees, build a sophisticated routing layer to optimize workload allocation and maintain flexibility, treat hardware as a consumable with accelerated refresh cycles, and invest heavily in efficiency to maximize the value of every token consumed. The current situation is not just a technology problem but an economic transformation that will reshape competitive dynamics across industries. Enterprises that adapt quickly will be positioned to survive and thrive, while those that delay will fall behind in what is described as the biggest technology race in history.