S&P 5005,842.10 0.42%
NASDAQ19,210.55 0.88%
NVDA1,184.22 2.41%
MSFT478.90 0.88%
GOOGL210.11 1.12%
META612.50 0.34%
AAPL239.80 0.21%
AMZN248.66 1.40%
AVGO1,902.40 3.12%
TSLA298.10 1.05%
BTC98,420 1.88%
ETH4,210 2.24%
10Y4.18% 0.02%
DXY104.12 0.18%
S&P 5005,842.10 0.42%
NASDAQ19,210.55 0.88%
NVDA1,184.22 2.41%
MSFT478.90 0.88%
GOOGL210.11 1.12%
META612.50 0.34%
AAPL239.80 0.21%
AMZN248.66 1.40%
AVGO1,902.40 3.12%
TSLA298.10 1.05%
BTC98,420 1.88%
ETH4,210 2.24%
10Y4.18% 0.02%
DXY104.12 0.18%
Back to homepage
Synthetic Data

Banks Are Buying Synthetic Data to Power AI — and It Changes Everything

How synthetic-data marketplaces let banks and fintechs train models without legal risk, and why regulators, cloud providers and chipmakers are recalibrating.

P
Pedro Marini
June 13, 2026 · 4 min read
Banks Are Buying Synthetic Data to Power AI — and It Changes Everything

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article
AI narration · ~4 min
Tickers mentioned
SNOW+1.20%NVDA+3.50%MSFT-0.80%PLTR-2.00%

Why this matters now

Synthetic data has graduated from a niche data-science trick. U.S. banks, fintechs and payments firms are increasingly using artificially generated datasets to train fraud-detection, credit-scoring and personalization models. It’s a quicker route to building AI capabilities — and a way to avoid circulating sensitive customer records across the company.

What it does

  • Produces realistic but artificial records that reproduce the statistical patterns of real customers while stripping direct identifiers. It is not identical to perfect anonymization, but it gets you closer than toy data.
  • Useful in finance for fraud simulations, stress-testing models, product QA, and developer sandboxes for regulated systems where you don’t want live PII floating around.

A short history, with a twist

This feels like the chapter after Open Banking and the GDPR-era scramble to limit access to raw personal data. Before, teams faced a blunt choice: use real data and wrestle with compliance, or use fake data and accept weak models. Synthetic data promises a middle path — appealing, but with its own messiness.

Who wins and who watches

  • Winners: cloud and data-infrastructure vendors that build marketplaces and pipelines. Snowflake-style clean rooms and APIs are the distribution layer financiers will favor.
  • Also winners: chipmakers and cloud GPU providers. Better generation models just mean more compute demand.
  • Watchers: regulators and compliance teams, who still need to decide case-by-case whether a synthetic copy counts as personal data.

Why it’s not a cure-all

  • Distributional gaps. Synthetic datasets often capture average behavior well but miss rare events — precisely the tail cases fraud models must catch.
  • Auditability. Regulators will ask for provenance: was the generator audited? Which seed data were used? These questions matter for model-risk frameworks.
  • Vendor lock-in. Several providers bundle generation, validation and hosting into closed stacks that can become single points of failure.

A real-world note

A mid-sized regional bank built a synthetic customer dataset to re-train a fraud detector without engaging its privacy office. The result: fewer legal reviews and faster iteration. In practice, though, the team still kept a holdout of real flagged frauds as a backstop — otherwise they risked missing unusual attack patterns. Felt like belt-and-suspenders engineering, but necessary.

What this means for investors and operators

  • Investors: don’t bet only on pure-play synthetic-data startups. The safer bets are infrastructure plays — secure data-sharing, cloud GPU capacity, model validation tooling — the plumbing that makes synthetic workflows reliable.
  • Operators: use synthetic data as an accelerator, not a compliance shield. Combine it with traditional privacy engineering, red-teaming, continuous monitoring and human review.

The upshot

Synthetic data marks an inflection point rather than a cure-all. In regulated finance it can shrink development cycles and reduce exposure, but it also creates new governance and audit needs and raises fresh questions about model trust. The next 12–24 months will show whether it becomes standard practice or simply another outsourced headache regulators will have to police.

Advertisement
Continue reading

Related coverage

The IMF Brief · Daily Newsletter

The AI economy, decoded before the open.

Five minutes. One email. The signal cutting through the noise at the intersection of artificial intelligence and Wall Street. Free, forever.

Join 184,000+ readers · No spam · Unsubscribe anytime