Phil Kaye, Co-Founder and Director of Vespertec, argues that alternative accelerators will grow inside hyperscalers, but it’s Nvidia’s ecosystem, alongside tightening memory and cooling constraints, that will shape most deployments in the year ahead.
In 2026, I expect hyperscalers will continue investing in their own accelerators – Google will expand its TPU deployments, and Meta is also evaluating the use of Google’s TPU platform. These chips will succeed in their own environments, but they won’t meaningfully challenge Nvidia’s position outside hyperscaler walls.
The reason is simple: what keeps Nvidia ahead is the depth and completeness of its ecosystem. Beyond the hardware itself, the CUDA software stack is deeply embedded across tools and workflows that teams already understand and rely on.
This maturity means Nvidia platforms are well supported across different system architectures and slot cleanly into mixed vendor environments. For organisations building AI infrastructure at scale, interoperability and software consistency make adoption far easier without forcing a wholesale redesign of existing estates or locking them into a single-vendor approach.
Alongside this prediction that Nvidia continues to dominate the market, Phil offered some other insights:
Private cloud makes a comeback, re-establishing itself as a central part of AI architecture
Earlier this month, AWS and Nvidia introduced the Private AI Factory, offering customers access to high-performance compute in an environment they can control more directly. When an operator at AWS’s scale invests in a model that brings advanced compute closer to where the data resides, the rest of the industry pays attention. In 2026, we will expect to see a noticeable shift back toward private cloud infrastructure as others follow their lead.
This is not a rejection of public cloud or nostalgia for older deployment patterns, but a reflection of a growing recognition among firms to take control over how their AI workload runs and how performance is governed. For many, regaining control over performance variables is becoming essential to achieving predictable scaling behaviour. AWS and Nvidia’s endorsement give the model legitimacy and sets a direction the rest of the market is likely to follow.
Industry-wide AI memory stays in super-shortage cycle, redefining architecture planning
DRAM, and especially DDR5, is heading into a severe shortage cycle. Fabrication plants are driving manufacturing capacity toward HBM3 for GPU production, which reduces the output of traditional RDIMMS just as AI servers are driving RDIMM demand to record levels. Vendors already know what their output capacity looks like for 2026, and the demand will soar so sharply that a single cloud provider could realistically snap up everything the market could produce.
With that in mind, we can expect aggressive pricing and allocation-based supply. But contrary to popular belief, both lead time for procurement and price pose the biggest risk for buyers. The days of last-minute expansions for performance upgrades are behind us. Organisations will need to plan far ahead and engage with partners earlier for an advantage.
NAND faces the same underlying pressure.
As demand for large, GPU-dense servers increases, so does the requirement for high-capacity, high-performance SSDs to support data-heavy AI workloads. This is driving sustained demand for enterprise NAND and contributing to ongoing price firming, with prices expected to continue rising gradually through 2026.
Organisations should plan for higher baseline costs and longer procurement timelines, particularly for high-capacity enterprise SSDs used in large-scale server deployments.
Cooling innovation moves beyond proof of concept, selectively
Cooling is entering a new phase, with immersion continuing to mature beyond proof of concept. However, direct liquid cooling (DLC) remains the primary choice for the most powerful AI systems. Technologies such as NVLink are not currently compatible with immersion environments, meaning the highest-performance, most thermally demanding platforms still depend on DLC.
As rack power moves beyond 70kW and airflow becomes harder to manage, these systems are the ones pushing cooling limits, yet they are also the least suited to immersion. This helps explain why immersion has not seen broader uptake so far.
That said, the ecosystem around immersion is improving. Interoperability testing is increasing, tooling and service models are maturing, and more vendors are prepared to support immersion where it makes sense. Immersion will not be mainstream in 2026, but it will feature more often in design discussions as organisations acknowledge that traditional cooling approaches will not always match future compute ambitions.
This article is part of our DCR Predicts 2026 series. Come back every week in January for more.


