New York · 09:42 ESTMarkets Open

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

Back to homepage

Synthetic Data

Banks Are Buying Fiction: How Synthetic Data Is Rewiring Finance AI

Synthetic and curated datasets are emerging as the missing link between privacy, model performance, and regulatory pressure — and investors should pay attention.

Pedro Marini

June 14, 2026 · 4 min read

Banks Are Buying Fiction: How Synthetic Data Is Rewiring Finance AI

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article

AI narration · ~4 min

Tickers mentioned

NVDA+0.00%SNOW+0.00%PLTR+0.00%MSFT+0.00%GOOGL+0.00%AMZN+0.00%

Here’s a simple, unsettling premise: financial institutions are training mission‑critical AI on data that never happened. It sounds like science fiction, but it’s a pragmatic response to three forces colliding on Wall Street and in Silicon Valley — tighter privacy rules, a shortage of labeled rare events, and the growing cost of chewing through ever‑larger raw datasets.

Think of synthetic data as a dress rehearsal for the market. Not the live performance, but a controlled room where you can trigger rare failures without blowing up a client portfolio.

Why this matters now

Privacy and compliance stopped being checkboxes and joined the boardroom conversation. Banks need datasets that keep analytic value while stripping out personally identifiable details.
Tail events are, by definition, uncommon. Synthetic generation lets models encounter more of the extreme scenarios they’ll have to endure.
Cloud pricing, plus new platforms for on‑demand generation and curated marketplaces, means this is now practical for mid‑tier banks and fintechs — not only the hyperscalers.

Who’s building the plumbing

Data marketplaces and platforms are positioning themselves as the distribution layer for curated and synthetic sets. Expect friction between data‑native startups and the cloud incumbents.
Model‑infrastructure vendors and GPU suppliers are selling the flip side: more compute to train on generated data, along with validation tooling.
You’ve seen early examples: Snowflake‑style marketplaces hosting curated datasets; lakehouse vendors packaging labeled financial feeds; niche startups offering synthetic transaction streams tuned for fraud detection. Components exist. The winner will be whoever stitches generation, validation, and deployment into an automated workflow.

The tradeoffs — because there are always tradeoffs

Upside: faster iteration, fewer privacy headaches, better stress‑testing of edge cases.
Downside: synthetic data can bake in designer biases. Train on worlds that are too neat and models will stumble hard when the real, messy world returns.
Regulatory risk: supervisors care about outcomes, not methodology. A discriminatory or unstable model invites scrutiny even if the training data were entirely synthetic.

A useful analogy: synthetic data is like a flight simulator. Pilots can practice for hurricanes they hope to never face. But a simulator only helps if it mirrors the physics; otherwise you engrain the wrong instincts.

What this means for investors and executives

Keep an eye on partnerships between data marketplaces and model vendors. They’re an early indicator of product‑market fit.
Demand verifiable validation tooling: methods that compare synthetic distributions with holdout real data and independent audits of model behavior.
Be skeptical of vendors promising perfect privacy and perfect accuracy. The practical winners will accept imperfect guarantees in exchange for rigorous measurement and governance.

A short playbook

For risk officers: require adversarial testing of any model trained on synthetic sets, and insist on a small, audited real‑data holdout to check performance.
For CTOs: build pipelines that can label, generate, version, and reproduce datasets. Prioritize reproducibility over one‑off gains.
For investors: favor firms selling verifiable tooling and strong data governance, not those pitching synthetic generation as a checkbox feature.

Why the history matters

The last decade in finance was about amassing data: tick feeds, alternative datasets, web scraping. The next phase is about stewardship. Volume still matters, but lineage, quality, and defensible privacy practices will be the differentiators. This isn’t another arms race to hoard logs; it’s a move toward curated, explainable inputs.

Where we land

Synthetic data won’t replace real‑world signals. Nor should it. But as a bridge between privacy, cost, and robustness, it’s reshaping how financial AI gets built. Expect a messy transition — new vendors, new audit regimes, and a fair bit of pushback when synthetic‑trained models misbehave. For now, the sensible play is not to accept the fiction blindly, but to fold it into governance, measurement, and skeptical testing.

Pedro Marini

Related coverage

Synthetic Data· 4 min

Synthetic Data Is the Quiet Gold Rush Reshaping AI Training

As privacy rules bite, companies and investors are betting on synthetic data — but the path from novelty to reliable enterprise tool is anything but smooth.

By Pedro Marini

News· 4 min

On-Device AI Hits the Mainstream: What It Means for Privacy, Phones, and Big Tech

Smartphones are no longer just clients for cloud AI. A new generation of tiny, efficient models and chip tricks is putting powerful assistants inside the device — and upending privacy, app economics, and the cloud business.

By Pedro Marini

News· 3 min

AI Voice Cloning Is Quietly Rewriting Phishing Playbooks

From cheap voice apps to automated LLM scripts, criminals are scaling tailored vishing attacks. Companies and investors need realistic defenses, not panic.

By Pedro Marini

Banks Are Buying Fiction: How Synthetic Data Is Rewiring Finance AI

Related coverage

Synthetic Data Is the Quiet Gold Rush Reshaping AI Training

On-Device AI Hits the Mainstream: What It Means for Privacy, Phones, and Big Tech

AI Voice Cloning Is Quietly Rewriting Phishing Playbooks

The AI economy, decoded before the open.