On-device AI Is Eating the Cloud GPU Boom — Who Wins, Who Loses
Mobile chips, OEM OS hooks and model compression are quietly redirecting billions in GPU spend. Expect winners in silicon, winners in tooling—and a few casualties.
Mobile chips, OEM OS hooks and model compression are quietly redirecting billions in GPU spend. Expect winners in silicon, winners in tooling—and a few casualties.

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini
The headline is simple: phones and laptops getting smarter on their own means less trivial inference sent to expensive cloud GPUs. That doesn’t mean the data‑center era ends — it just reshapes around different roles.
A year ago this felt like a niche bet. Now device makers ship NPUs and SDKs that let compact LLMs run locally with latency and privacy advantages enterprises actually care about. Couple that with aggressive quantization, pruning, and smarter caching, and a slice of inference demand that used to live in cloud racks is quietly migrating to the edge.
A few concrete shifts you’ll see
Think back to when app stores lured ad dollars away from the web. It didn’t happen overnight, it was messy, and it created new vendor classes. We’re seeing something similar here — except the contest is over compute, not distribution.
Net effect: AI compute is maturing. Blanket cloud inference is fading, but a more nuanced, hybrid economic model is taking shape. That’s good for specialization — nimble chipmakers, focused tooling startups, and cloud providers that adapt will find opportunities. Those that assume the old full‑stack cloud model persists risk getting surprised.
— Pedro Marini

Big banks are trimming yields. Short-term Treasuries, ultra-short ETFs and I Bonds offer alternatives — here’s a practical plan to protect liquidity and returns.

From FICO to machine learning: fintechs promise smarter lending, but consumers and regulators are pushing back. What the shift means for credit, risk and markets.

As money floods AI-focused funds, one chipmaker dominates holdings. That concentration changes the risk profile of a supposedly diversified bet on artificial intelligence.