New York · 09:42 ESTMarkets Open

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

Back to homepage

Synthetic Data

Why Synthetic Data Is the New Commodity Banks Are Buying

Financial firms are swapping raw customer records for algorithmically generated datasets. It lowers legal risk, speeds model building—and forces new trade-offs.

Pedro Marini

June 15, 2026 · 4 min read

Why Synthetic Data Is the New Commodity Banks Are Buying

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article

AI narration · ~4 min

Tickers mentioned

NVDA+1.80%DBX+0.60%MSFT-0.40%AMZN+0.30%

The shift is underway. Over the past 18 months a quiet but consequential migration has begun inside banks, payments firms, and fintechs: engineering teams that once begged for more real customer rows are now asking for better synthetic ones.

This is not a novelty stunt. Synthetic datasets—produced by GANs, variational autoencoders, and probabilistic samplers—let institutions train models without moving sensitive personal records. For compliance officers juggling CPRA and GLBA, and for ML teams constantly hitting data access bottlenecks, that combination is hard to ignore.

Why now

Speed: Generating labeled samples collapses weeks of wrangling into days. Iteration cycles get tighter.
Privacy: Well-constructed synthetic data reduces the attack surface for leaks and makes consent and sharing simpler.
Cost: Annotation and procurement budgets stretch further when synthetic sets can stand in for expensive labeling campaigns.

Still, the upside comes with some important caveats.

What synthetic data buys you—and where it falls short

Use cases where synthetic data really helps

Fraud and anomaly detection: you can synthesize rare-but-important attack scenarios that barely appear in production logs.
Model validation and stress testing: generate extreme yet plausible customer behaviors to probe resilience.
Feature engineering and prototyping: get early signals without waiting for legal sign-offs or full production datasets.

Limits and risks

Tail fidelity: generators often smooth over rare interactions, and those long-tail quirks are exactly what trigger real-world fraud.
Model leakage: a sloppy generator can end up regurgitating private rows, trading one privacy problem for another.
Regulatory skepticism: auditors and examiners still prefer line-level provenance or strict lineage for evidence.

A short history helps explain the rush

Ten years ago, ML projects in finance were routinely stalled by legal reviews and slow anonymization. That split teams into two habits: conservative risk managers who tightly controlled data, and product groups that quietly built shadow datasets to move faster. Synthetic data promises a reconciliation—analytical freedom without handing out raw PII. That promise, unsurprisingly, has attracted investor interest and vendor activity. Cloud providers and GPU makers — the firms that power large-scale generation — stand to benefit as enterprises adopt these tools.

Where the market may split

Over the next 12–24 months I expect two distinct paths to form:

Enterprise-grade platforms that prioritize governance, audit trails, and provable privacy guarantees. Those will appeal to banks and regulated fintechs.
Lightweight, open-source toolkits that startups and research labs prefer for flexibility and speed, trading governance for agility.

It mirrors a familiar division in enterprise software: the safer, compliance-first route, and the scrappier innovation track.

What this means for investors and execs

Watch for real adoption signals: vendor deals with major banks, SOC 2 / ISO certifications, and independent privacy audits will be more telling than flashy demos.
Monitor compute demand. Large-scale synthetic generation is GPU-heavy; rising adoption supports hardware and cloud providers.
Be skeptical of marketing claims. Ask for concrete metrics: how closely do synthetic distributions match the originals? Do stress tests include rare-event fidelity?

The practical upshot

Synthetic data is not a magic bullet, but it is a useful lever. For US financial firms constrained by privacy rules and competitive pressure, it offers a way to move faster without shrugging off compliance. Expect cautious, measured adoption driven by governance features rather than a wholesale replacement of live-data pipelines.

If you work in ML or risk at a bank, start small: pilot synthetic datasets on noncritical models, require independent privacy validation, and treat governance as the gating factor. The first teams that get that balance right will unlock real advantage without inviting regulatory headaches.

Related coverage

News· 5 min

SEC, CFTC Eyeing AI in Trading, Disclosure Practices

U.S. financial regulators are scrutinizing the increasing use of artificial intelligence in capital markets, focusing on potential systemic risks and the adequacy of current disclosure requirements.

By IMF Alpharoom AI

News· 5 min

Nvidia AI Chip Demand and Hyperscaler Capex Trends

Strong demand for Nvidia's AI accelerators persists, driving significant capital expenditures among major cloud providers, influencing market dynamics and hardware supply chains.

By IMF Alpharoom AI

Synthetic Data· 3 min

Banks Are Training AI on Fake Money: Why Synthetic Financial Data Is Suddenly Hot

Synthetic financial data promises privacy and scale — but it may be trading one set of risks for another. Investors and regulators should pay attention.

By Pedro Marini

Why Synthetic Data Is the New Commodity Banks Are Buying

Related coverage

SEC, CFTC Eyeing AI in Trading, Disclosure Practices

Nvidia AI Chip Demand and Hyperscaler Capex Trends

Banks Are Training AI on Fake Money: Why Synthetic Financial Data Is Suddenly Hot

The AI economy, decoded before the open.