New York · 09:42 ESTMarkets Open

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

Back to homepage

Data For AI

Banks Bet on Synthetic Data: The Quiet Engine Behind Finance's AI Push

Financial institutions are shifting from proprietary datasets to synthetic data and clean rooms to train AI — a privacy-first, business-second pivot reshaping risk, vendors, and valuations.

Pedro Marini

June 6, 2026 · 4 min read

Banks Bet on Synthetic Data: The Quiet Engine Behind Finance's AI Push

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article

AI narration · ~4 min

Tickers mentioned

SNOW+1.50%MSFT-0.80%GOOG+0.90%AMZN+0.60%PLTR-2.30%

Overview

Banks and large financial firms are quietly shifting toward synthetic data and data clean rooms to train AI. This isn’t the flashy chatbot story you read in the business press; it’s an unglamorous infrastructure change driven by privacy rules, litigation exposure, and the blunt economics of model training. Think practical, not theatrical.

Why now

Regulators and customers make using real records riskier. State privacy laws plus Gramm-Leach-Bliley obligations raise the cost and legal friction for running models on raw PII.
Generative models demand huge, diverse datasets. Pulling that volume out of production systems, properly labeled and safe, is slow and expensive.
Synthetic data can reproduce the look and statistical behavior of real records without exposing individuals, and clean rooms let multiple parties collaborate without handing over raw tables.

What’s interesting is that this trade-off—plausible realism without direct exposure—is finally good enough for many financial use cases. Not always perfect, but often usable.

Who is winning, and why it matters

Cloud providers and data-platform vendors are embedding these capabilities into the stack. Snowflake is folding clean-room capabilities into the warehouse. Hyperscalers offer managed synthetic services. And specialist vendors focus on privacy-preserving generation tuned to tabular financial data.

For investors, this opens a new line of recurring revenue: not only storage and compute, but packaged datasets and privacy tooling sold as ongoing services. Expect vendors to be re-priced on their ability to certify privacy-safe model training, and for partnerships between data owners and platforms to become commercially important.

This isn’t a guaranteed market for incumbents only—trust, implementation quality, and third-party validation will matter more than flashy demos.

Concrete use cases in finance

Fraud detection trained on synthetic transaction histories that preserve fraud patterns while hiding customer identities.
Credit-risk simulations built from synthetic cohorts that keep correlations across income, employment, and repayment behavior.
Stress-testing and scenario analysis where synthetic tail events are injected to probe model robustness without exposing real customer losses.

Each use case has different tolerance for approximation. Some work well with synthetic proxies; others demand careful hybrid approaches.

Tradeoffs and real risks

Synthetic data is no cure-all. Three dangers stand out.

Performance gaps. Synthetic distributions can fail to reproduce rare or adversarial patterns that matter in finance, and that gap shows up when models meet reality.
Fingerprinting and reconstruction. Poor generation can leak membership signals—so synthetic does not automatically equal safe.
Governance theater. It’s easy to stitch together a synthetic pipeline to check a compliance box, while downstream models remain unchecked.

Regulators are likely to shift scrutiny away from raw-data controls toward model outcomes: audits of explainability, third-party validation of synthetic processes, and proof that models trained on synthetic inputs behave responsibly in production.

Practical recommendations

For executives: adopt hybrids. Combine curated real samples with synthetic augmentation, and invest in rigorous out-of-sample testing to catch gaps.
For investors: watch firms that bake clean-room capabilities into core platforms and those selling labeled synthetic datasets tailored to finance verticals.
For regulators: concentrate on auditable outcomes and verifiable testing rather than upstream labels alone.

A cautious, evidence-driven approach beats rhetorical commitments.

In practice, the winners will be the teams that can demonstrate both utility and privacy—not the ones with the flashiest generator demos. Proving that balance, repeatedly and publicly, will be the hard yard.

Related coverage

News· 4 min

Banks Bet on Synthetic Data to Train AI — But Is It Safe?

From clean rooms to simulated customers, financial firms are racing to create usable datasets for generative AI while dodging privacy pitfalls

By Pedro Marini

News· 4 min

On-Device AI Is Coming for the Cloud: Who Wins the Offline Arms Race?

Smartphones and PCs are starting to run generative models locally. That shifts power to chipmakers, changes app economics, and gives privacy a new marketing lifeline.

By Pedro Marini

News· 4 min

Offline AI Comes to Your Wallet: What On-Device LLMs Mean for Banking

From privacy-by-default budgeting to instant fraud checks, on-device generative models are reshaping fintech. Here’s what consumers, banks and investors should watch next.

By Pedro Marini

Banks Bet on Synthetic Data: The Quiet Engine Behind Finance's AI Push

Related coverage

Banks Bet on Synthetic Data to Train AI — But Is It Safe?

On-Device AI Is Coming for the Cloud: Who Wins the Offline Arms Race?

Offline AI Comes to Your Wallet: What On-Device LLMs Mean for Banking

The AI economy, decoded before the open.