Why Tech Giants Are Quietly Building Chips to Cut Nvidia Out
The cloud is engineering its own AI silicon — a defensive play that could reshape margins, supply chains, and who wins the AI profit pool
The cloud is engineering its own AI silicon — a defensive play that could reshape margins, supply chains, and who wins the AI profit pool

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini
For years the debate focused on model architecture, datasets, and frameworks. Lately the fight has moved deeper — down to die and wafer. Cloud providers are pouring money into custom AI chips to cut reliance on Nvidia and to lower cost per inference at scale.
This isn't vaporware. Google has run Tensor Processing Units for a long time. Amazon built Inferentia and Trainium to offer lower-cost alternatives inside AWS. Microsoft still leans on Nvidia for many workloads, but it also funds bespoke hardware and tight vendor partnerships. Meta and a few hyperscalers have quietly prototyped accelerators for internal use. It feels a bit like the server era replaying itself, when hyperscalers shifted from off-the-shelf parts to custom boards and racks.
Nvidia currently powers most large training runs and inference clusters. That gives it pricing power and creates a potential choke point. Hyperscalers do the math: billions or trillions of inferences and small per-unit charges add up to a persistent tax on margin.
Custom silicon is more than cheaper chips. You can design for the exact precision, memory bandwidth, and interconnect patterns a service needs. Over large volumes, modest per-inference gains compound into real margin improvement.
The software environment is less hostile to new silicon than it was five years ago. ONNX and a growing set of open compiler projects make targeting non-Nvidia accelerators feasible in ways that used to be impractical.
What's interesting here is the combination: economics plus better software makes the proposition realistic rather than theoretical.
Building chips is costly and messy. A few counterpoints to keep in mind:
Nvidia’s advantage goes beyond silicon. CUDA, the developer ecosystem, and optimized kernels create heavy switching costs. Models tuned for CUDA often need substantive rework to reach parity on new hardware.
Scale helps specialists. Nvidia’s manufacturing partners, packaging, and thermal design know-how shorten time-to-market. A cloud provider diverting billions into hardware is taking a strategic gamble versus buying proven components.
Fragmentation could slow progress. If each cloud has its own quirks, model portability suffers and startups may prefer one predictable vendor rather than juggling multiple runtimes.
So yes, custom chips can win cost and control, but the road is full of engineering and ecosystem traps.
Gradual verticalization: Hyperscalers shift more inference — and some training — to their silicon, while Nvidia retains the top-tier, research-heavy training market. Cloud margins tighten, but Nvidia keeps the highest-end slices.
Faster commoditization: Open standards or fierce pricing force Nvidia to compete on price and openness. That speeds some innovation but compresses Nvidia’s margins.
Fragmented equilibrium: Different clouds optimize for different workloads, and the industry learns to live with heterogeneity. Messy for developers, but fertile ground for chip experimentation.
None of these is guaranteed; the industry could slide between them depending on cost curves, software progress, and manufacturing realities.
Headcount and CapEx in hardware teams at major clouds. A steady rise in chip engineering hires is a clearer signal than a press release.
Tooling investments: compilers, ONNX optimizations, cross-platform runtimes. Those show whether a chip is strategic or just exploratory.
For startups, multi-cloud portability will become a practical requirement sooner than later. Betting everything on CUDA-only deployments looks riskier today.
This is not only a hardware contest. It’s about who controls economics and the full stack. Hyperscalers building silicon is a logical next move in their effort to own more of the customer experience and margin. Nvidia won't be displaced overnight, but the stakes are high: whoever controls both chip and stack captures a disproportionate share of downstream value.
I think of cloud silicon as a long game — slow, iterative, and consequential. Expect splashy headlines when new chips ship at scale, but the quieter signals — internal benchmarks, customer price cards, and the toolchain choices — will show the real direction first.

Third-quarter fintech earnings reports indicate a divergence in performance driven by payment processing volumes and advancements in AI-powered credit underwriting.
The global semiconductor supply chain is experiencing significant pressure, driven by increasing AI demand and ongoing capacity limitations at leading foundries like TSMC.

How synthetic-data marketplaces let banks and fintechs train models without legal risk, and why regulators, cloud providers and chipmakers are recalibrating.