New York · 09:42 ESTMarkets Open

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

Back to homepage

Data For AI

The Real AI Gold: Why Data Infrastructure Will Outperform Models

As model architectures stabilize, the next competitive moat is the messy work of data pipelines, labeling and marketplaces — and investors are starting to notice.

Pedro Marini

June 21, 2026 · 4 min read

The Real AI Gold: Why Data Infrastructure Will Outperform Models

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article

AI narration · ~4 min

Tickers mentioned

SNOW+2.40%PLTR-1.30%MSFT+0.80%GOOGL+1.10%AMZN+0.50%MDB-0.90%AI+3.20%

Why the fuss about data now?

Model headlines — bigger transformers, fancier multimodal tricks — grab attention. But the quiet, costly work that rarely makes the front page is data sourcing, cleaning and governance. For years the obvious levers were compute and model size; now the pendulum is swinging toward data quality and access. Think less oil and more soil: fertile, tended, and compounding over time. Sounds boring. It also matters more than most people realize.

A short history to keep in mind

Early machine learning ran on clever models and scarce labeled sets. Teams squeezed performance out of small, curated datasets.
The last five years flipped the script: scale of data plus scale of compute produced foundation models that generalize in surprising ways.
With foundational architectures widely available, the next frontier is the proprietary data and the plumbing that converts raw logs, images and documents into training-grade datasets. Yes, the messy ETL and annotation work.

Where value is being created today

Data marketplaces and clean rooms are turning private and third-party datasets into monetizable, privacy-aware assets. Snowflake Marketplace is a clear example of data being packaged and transacted.
Labeling and annotation platforms are the bridge between raw data and usable training examples. Cloud vendors bundling labeling into ML suites is making the path to production smoother — and stickier.
Synthetic data and programmatic labeling lower dependence on costly human annotation and reduce exposure of sensitive records. They help, but they are not a perfect substitute.
Data ops — versioning, lineage, cataloging — are becoming table stakes for any enterprise that wants reproducible ML at scale.

What's interesting here is how these pieces interact: better marketplaces drive more labeling demand, which in turn increases the need for governance and lineage.

Why investors should care

Margins. Data infrastructure often sells like software with recurring revenue. Once datasets and pipelines are embedded, switching costs go up.
Durability. Proprietary data partnerships and curated marketplaces create moats that are harder to replicate than a model checkpoint on the internet.
Diversification. Backing data suppliers and tooling spreads exposure beyond a single-model bet and ties value to real-world signals across industries.

Not a slam-dunk, but a different risk profile than betting only on model architects.

Counterpoints and risks

Commoditization is real. If third-party data becomes indexed and standardized, pricing pressure follows.
Regulation on provenance and consent is tightening. Compliance will add cost and may restrict useful datasets.
Overreliance on synthetic data can mislead models if the generated distributions drift from reality.

Practical signals to watch

Partnerships between cloud giants and niche data providers; those deals often set distribution patterns and pricing power.
Growth in data marketplace transactions and in-platform billing on exchanges — billing is a surprisingly good leading indicator.
Adoption metrics for governance tools: active catalogs, production lineage coverage, and the rate of labeled dataset growth in production.

A quick investor map

Big cloud and data-platform incumbents are folding data services into their stacks. That gives them optionality: either capture more of the data value chain or partner with specialists.
Specialist vendors — labeling, synthetic data, data ops — can expand multiples if they prove enterprise retention and pricing power. Watch churn and net revenue retention closely.

The shiny race to build bigger models will go on. But the organizations that control the messy, sticky plumbing of data are likely to be the strategic winners over the next five years. For investors tired of putting all chips on model makers, the quietly growing world of data infrastructure offers a more defensible, revenue-bearing route into the AI era.

Author note: I track product integrations and billing signals more than press releases. The shift toward data-first strategies feels like the end of one chapter and the start of a more boring — but far more profitable — one.

Related coverage

Data For AI· 4 min

Wall Street's New Gold: How Transaction Data Is Powering Finance-Grade AI

A quiet market is forming where banks, retailers and data brokers sell the high-quality transaction signals that are reshaping trading, lending and fintech products.

By Pedro Marini

News· 4 min

On-Device AI Is Eating the Cloud: What Investors and Users Need to Know

Tiny models on phones are reshaping privacy, chip demand, and cloud revenue. A practical guide for investors, product teams, and power users.

By Pedro Marini

News· 3 min

Offline Chat, Online Fallout: How On‑Device AI Is Rewiring Phones, Privacy and Profits

Running large language models on your phone is no longer fantasy. Expect faster replies, tighter privacy, new app economics—and a few market shakeups.

By Pedro Marini

The Real AI Gold: Why Data Infrastructure Will Outperform Models

Related coverage

Wall Street's New Gold: How Transaction Data Is Powering Finance-Grade AI

On-Device AI Is Eating the Cloud: What Investors and Users Need to Know

Offline Chat, Online Fallout: How On‑Device AI Is Rewiring Phones, Privacy and Profits

The AI economy, decoded before the open.