S&P 5005,842.10 0.42%
NASDAQ19,210.55 0.88%
NVDA1,184.22 2.41%
MSFT478.90 0.88%
GOOGL210.11 1.12%
META612.50 0.34%
AAPL239.80 0.21%
AMZN248.66 1.40%
AVGO1,902.40 3.12%
TSLA298.10 1.05%
BTC98,420 1.88%
ETH4,210 2.24%
10Y4.18% 0.02%
DXY104.12 0.18%
S&P 5005,842.10 0.42%
NASDAQ19,210.55 0.88%
NVDA1,184.22 2.41%
MSFT478.90 0.88%
GOOGL210.11 1.12%
META612.50 0.34%
AAPL239.80 0.21%
AMZN248.66 1.40%
AVGO1,902.40 3.12%
TSLA298.10 1.05%
BTC98,420 1.88%
ETH4,210 2.24%
10Y4.18% 0.02%
DXY104.12 0.18%
Back to homepage
Synthetic Data

Why Synthetic Data Became Wall Street's Newest Trade

Banks and fintech are swapping real records for fake ones to train AI — a privacy play that creates winners, losers, and a fresh set of regulatory headaches.

P
Pedro Marini
July 1, 2026 · 3 min read
Why Synthetic Data Became Wall Street's Newest Trade

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article
AI narration · ~3 min
Tickers mentioned
NVDA+0.00%SNOW+0.00%MSFT+0.00%PLTR+0.00%

Summary

Synthetic data is slipping out of research labs and into trading floors and loan desks. For investors this looks like a structural growth story: more demand for compute, cloud data tooling, and analytics that feed models. But it comes with tricky trade-offs around fidelity, bias, and compliance.

What’s happening now

Large banks and fintechs are increasingly training and validating models on synthetic datasets. Rather than sharing customer records, teams generate artificial-but-plausible profiles that keep the same statistical shape while masking identities. The benefit is obvious: lower privacy risk, fewer legal knots to untie, and much faster experimentation. Of course, it is not a panacea.

Why this matters to markets

  • Cost and scale: synthetic data makes it easier to spin up large labeled datasets, which pushes demand for GPUs and cloud capacity. Infrastructure vendors win when workloads grow.
  • Vendor opportunity: firms that can produce high-fidelity synthetic feeds — feeds that models trust — can command recurring fees as customers standardize around them.
  • Regulatory gray area: banks may be able to move faster, but regulators have yet to decide whether synthetic substitutes remove accountability for bad model outcomes. That ambiguity matters.

Concrete implications (what investors should watch)

  • Suppliers of compute and data platforms capture much of the economics as synthetic workloads scale.
  • Look for deals that pair legacy data holders with AI vendors; those partnerships are often the first sign of a durable revenue stream.
  • Track regulatory guidance and court challenges closely. One high-profile enforcement action could change adoption timelines overnight.

Counterpoints and risks

  • Synthetic does not equal safe. Poor generators can amplify bias or unintentionally leak patterns that sophisticated attackers can use to re-identify people.
  • Overreliance on synthetic sets can introduce fragility when real-world distributions shift — classic train/test mismatch, only louder.
  • Standards and provenance controls are immature. Without open benchmarks, buyers risk vendor lock-in and painful audits.

A short history lesson

There is precedent here. Financial firms once hoarded proprietary datasets as moats. Cloud and APIs turned data into a product. Synthetic is the next twist: not hoarding so much as sanitizing and packaging. Think of it as data-wrangling 2.0 — fewer gates, more slices to sell. That distinction matters more than it might sound at first.

Examples that clarify

  • Retail lenders can stress-test credit models by generating millions of synthetic borrower journeys that include rare events missing from historical data.
  • Trading desks can simulate thousands of unusual price and order-flow scenarios without revealing counterparty details.

What to watch next — practical checklist

  • Regulatory memos on de-identification and algorithmic accountability.
  • Earnings commentary from cloud and GPU providers about how much synthetic work they see.
  • Partnerships, pilot deals, and M&A among data owners, AI vendors, and banks.

Final take

Synthetic data is not magic. It is a practical lever that can speed innovation and change where value accrues. For investors the smarter approach is not betting a single vendor but mapping the ecosystem — compute, platforms, synthetic specialists, and compliance tooling. That map will determine who captures long-term value as finance learns to build on fake data that has to behave like the real thing.

Advertisement
Continue reading

Related coverage

The IMF Brief · Daily Newsletter

The AI economy, decoded before the open.

Five minutes. One email. The signal cutting through the noise at the intersection of artificial intelligence and Wall Street. Free, forever.

Join 184,000+ readers · No spam · Unsubscribe anytime