Who Owns Data for AI: The Battle Between Licensed and Synthetic Sets
As models gobble data, licensed datasets and synthetic alternatives are reshaping who profits, who risks legal exposure, and which stocks to watch.
As models gobble data, licensed datasets and synthetic alternatives are reshaping who profits, who risks legal exposure, and which stocks to watch.

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini
Scraped web data was the cheap fuel of the last decade. Now, datasets with clear provenance and licensing — plus synthetic alternatives — look like the real growth runway. That shift matters for platforms, data brokers, creators and investors.
Think 19th-century oilfields. Early players grabbed whatever they could find. Once infrastructure, capital and regulation arrived, leases and proven reserves determined value. Data is moving from wildcat scraping to regulated, monetized reservoirs.
This is not an on/off switch. Licensed and synthetic data will coexist and often complement each other. Still, the economics and legal realities favor firms that can prove provenance, deliver reliable labels, and productize distribution — they will likely extract outsized margins. For investors that suggests tilting toward enterprise data platforms, content licensors, and cloud partners that stitch datasets into dependable products.
If you take away one thing: the crude advantage of raw scraped corpora is fading. Quality, traceability and legal clarity are becoming the premium — and that rewards whoever can prove what they sell, not just how much they scraped.

How cloud giants, startups and synthetic-data vendors are packaging, selling and protecting the raw material powering generative AI — and what it means for investors.

Regulatory risk, licensing fights and mounting privacy pressure are pushing U.S. companies to buy and build synthetic datasets — and investors are paying attention.

Tiny LLMs, phone NPUs and smarter chips are turning smartphones into private AI assistants. Here’s what that means for privacy, apps and investors.