Banks Are Buying Synthetic Data to Power AI — and It Changes Everything
How synthetic-data marketplaces let banks and fintechs train models without legal risk, and why regulators, cloud providers and chipmakers are recalibrating.
How synthetic-data marketplaces let banks and fintechs train models without legal risk, and why regulators, cloud providers and chipmakers are recalibrating.

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini
Synthetic data has graduated from a niche data-science trick. U.S. banks, fintechs and payments firms are increasingly using artificially generated datasets to train fraud-detection, credit-scoring and personalization models. It’s a quicker route to building AI capabilities — and a way to avoid circulating sensitive customer records across the company.
This feels like the chapter after Open Banking and the GDPR-era scramble to limit access to raw personal data. Before, teams faced a blunt choice: use real data and wrestle with compliance, or use fake data and accept weak models. Synthetic data promises a middle path — appealing, but with its own messiness.
A mid-sized regional bank built a synthetic customer dataset to re-train a fraud detector without engaging its privacy office. The result: fewer legal reviews and faster iteration. In practice, though, the team still kept a holdout of real flagged frauds as a backstop — otherwise they risked missing unusual attack patterns. Felt like belt-and-suspenders engineering, but necessary.
Synthetic data marks an inflection point rather than a cure-all. In regulated finance it can shrink development cycles and reduce exposure, but it also creates new governance and audit needs and raises fresh questions about model trust. The next 12–24 months will show whether it becomes standard practice or simply another outsourced headache regulators will have to police.

Third-quarter fintech earnings reports indicate a divergence in performance driven by payment processing volumes and advancements in AI-powered credit underwriting.
The global semiconductor supply chain is experiencing significant pressure, driven by increasing AI demand and ongoing capacity limitations at leading foundries like TSMC.

Startups, Unity, Nvidia and Snowflake are racing to supply synthetic datasets. It will cut costs, complicate compliance, and reshape who profits from AI.