Who Owns the Data That Trains AI? Inside the Marketplace Gold Rush
How cloud giants, startups and synthetic-data vendors are packaging, selling and protecting the raw material powering generative AI — and what it means for investors.
How cloud giants, startups and synthetic-data vendors are packaging, selling and protecting the raw material powering generative AI — and what it means for investors.

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini
The narrative everyone repeats is simple: data is the new oil. That shorthand misses the point. Unlike oil, data multiplies when you combine it, quickly loses value if stripped of context, and is tangled in privacy, licensing and technical frictions that make a straightforward market hard to build.
In the last 18 months a distinct market has taken shape: data specifically packaged for AI. Snowflake and cloud marketplaces sell cleaned, labeled feeds. Databricks and cloud providers are adding governance and clean-room primitives. A wave of startups offers synthetic alternatives designed to dodge privacy and compliance headaches. Venture money followed, and corporations started thinking differently: maybe monetize data instead of locking it away.
Why this matters now
Who's building the roads
The friction points investors and product teams often underplay
A short history lesson
Selling slices of reality is not new. Credit bureaus, market‑data terminals and ad exchanges have done this for decades. What’s different now is scale, model sensitivity to nuance, and regulatory scrutiny after high‑profile scraping fights. A better analogy than oil might be electricity: you need infrastructure for clean, governed data before reliable applications can run.
Signals to watch next
Investment and corporate takeaways
This market is part economic opportunity, part trust architecture. The winners will be the companies that can prove three things: provenance, privacy and predictive value. Expect plenty of noise, a few genuine surprises, and a messy regulatory conversation as this shakes out.

Regulatory risk, licensing fights and mounting privacy pressure are pushing U.S. companies to buy and build synthetic datasets — and investors are paying attention.

Tiny LLMs, phone NPUs and smarter chips are turning smartphones into private AI assistants. Here’s what that means for privacy, apps and investors.

Enterprises are deploying AI-driven systems that can detect and act without human sign-off. Faster containment, bigger risks—here's what CIOs and investors need to know.