Banks Are Buying Fiction: How Synthetic Data Is Rewiring Finance AI
Synthetic and curated datasets are emerging as the missing link between privacy, model performance, and regulatory pressure — and investors should pay attention.
Synthetic and curated datasets are emerging as the missing link between privacy, model performance, and regulatory pressure — and investors should pay attention.

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini
Here’s a simple, unsettling premise: financial institutions are training mission‑critical AI on data that never happened. It sounds like science fiction, but it’s a pragmatic response to three forces colliding on Wall Street and in Silicon Valley — tighter privacy rules, a shortage of labeled rare events, and the growing cost of chewing through ever‑larger raw datasets.
Think of synthetic data as a dress rehearsal for the market. Not the live performance, but a controlled room where you can trigger rare failures without blowing up a client portfolio.
Why this matters now
Who’s building the plumbing
The tradeoffs — because there are always tradeoffs
A useful analogy: synthetic data is like a flight simulator. Pilots can practice for hurricanes they hope to never face. But a simulator only helps if it mirrors the physics; otherwise you engrain the wrong instincts.
What this means for investors and executives
A short playbook
Why the history matters
The last decade in finance was about amassing data: tick feeds, alternative datasets, web scraping. The next phase is about stewardship. Volume still matters, but lineage, quality, and defensible privacy practices will be the differentiators. This isn’t another arms race to hoard logs; it’s a move toward curated, explainable inputs.
Where we land
Synthetic data won’t replace real‑world signals. Nor should it. But as a bridge between privacy, cost, and robustness, it’s reshaping how financial AI gets built. Expect a messy transition — new vendors, new audit regimes, and a fair bit of pushback when synthetic‑trained models misbehave. For now, the sensible play is not to accept the fiction blindly, but to fold it into governance, measurement, and skeptical testing.
Pedro Marini

As financial firms swap raw customer records for engineered datasets, the winners will be those who balance speed with skeptical validation.

Smartphones and edge chips are pushing large language models and inference off servers. That shift reshuffles winners, risks, and the economics of AI.

Generative AI is sharpening attacks and defenses at once. Enterprises, investors, and CISOs face a fast-moving threat that demands strategy, not band-aids.