New York · 09:42 ESTMarkets Open

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

Back to homepage

Synthetic Data

Banks Are Buying Synthetic Data to Power AI — and It Changes Everything

How synthetic-data marketplaces let banks and fintechs train models without legal risk, and why regulators, cloud providers and chipmakers are recalibrating.

Pedro Marini

June 13, 2026 · 4 min read

Banks Are Buying Synthetic Data to Power AI — and It Changes Everything

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article

AI narration · ~4 min

Tickers mentioned

SNOW+1.20%NVDA+3.50%MSFT-0.80%PLTR-2.00%

Why this matters now

Synthetic data has graduated from a niche data-science trick. U.S. banks, fintechs and payments firms are increasingly using artificially generated datasets to train fraud-detection, credit-scoring and personalization models. It’s a quicker route to building AI capabilities — and a way to avoid circulating sensitive customer records across the company.

What it does

Produces realistic but artificial records that reproduce the statistical patterns of real customers while stripping direct identifiers. It is not identical to perfect anonymization, but it gets you closer than toy data.
Useful in finance for fraud simulations, stress-testing models, product QA, and developer sandboxes for regulated systems where you don’t want live PII floating around.

A short history, with a twist

This feels like the chapter after Open Banking and the GDPR-era scramble to limit access to raw personal data. Before, teams faced a blunt choice: use real data and wrestle with compliance, or use fake data and accept weak models. Synthetic data promises a middle path — appealing, but with its own messiness.

Who wins and who watches

Winners: cloud and data-infrastructure vendors that build marketplaces and pipelines. Snowflake-style clean rooms and APIs are the distribution layer financiers will favor.
Also winners: chipmakers and cloud GPU providers. Better generation models just mean more compute demand.
Watchers: regulators and compliance teams, who still need to decide case-by-case whether a synthetic copy counts as personal data.

Why it’s not a cure-all

Distributional gaps. Synthetic datasets often capture average behavior well but miss rare events — precisely the tail cases fraud models must catch.
Auditability. Regulators will ask for provenance: was the generator audited? Which seed data were used? These questions matter for model-risk frameworks.
Vendor lock-in. Several providers bundle generation, validation and hosting into closed stacks that can become single points of failure.

A real-world note

A mid-sized regional bank built a synthetic customer dataset to re-train a fraud detector without engaging its privacy office. The result: fewer legal reviews and faster iteration. In practice, though, the team still kept a holdout of real flagged frauds as a backstop — otherwise they risked missing unusual attack patterns. Felt like belt-and-suspenders engineering, but necessary.

What this means for investors and operators

Investors: don’t bet only on pure-play synthetic-data startups. The safer bets are infrastructure plays — secure data-sharing, cloud GPU capacity, model validation tooling — the plumbing that makes synthetic workflows reliable.
Operators: use synthetic data as an accelerator, not a compliance shield. Combine it with traditional privacy engineering, red-teaming, continuous monitoring and human review.

The upshot

Synthetic data marks an inflection point rather than a cure-all. In regulated finance it can shrink development cycles and reduce exposure, but it also creates new governance and audit needs and raises fresh questions about model trust. The next 12–24 months will show whether it becomes standard practice or simply another outsourced headache regulators will have to police.

Related coverage

Synthetic Data· 4 min

Synthetic Data Is the New Oil for AI — But Is It Worth the Hype?

As privacy rules tighten and labeling costs skyrocket, companies are betting on synthetic datasets to train models. Here’s who stands to gain — and who might lose.

By Pedro Marini

News· 4 min

On-Device AI Is the New Battleground: What It Means for Privacy, Apps, and Investors

Smartphones are running larger models locally. That shift reshapes app economics, chips, and financial services in ways investors and developers are only starting to price in.

By Pedro Marini

News· 4 min

AI-Driven Phishing Surges: What U.S. Companies Must Do Today

Cybercriminals are using large language models to craft hyper-personalized lures and voice deepfakes. Defenders can fight back, but speed and strategy matter.

By Pedro Marini

Banks Are Buying Synthetic Data to Power AI — and It Changes Everything

Why this matters now

What it does

A short history, with a twist

Who wins and who watches

Why it’s not a cure-all

A real-world note

What this means for investors and operators

The upshot

Related coverage

Synthetic Data Is the New Oil for AI — But Is It Worth the Hype?

On-Device AI Is the New Battleground: What It Means for Privacy, Apps, and Investors

AI-Driven Phishing Surges: What U.S. Companies Must Do Today

The AI economy, decoded before the open.