S&P 5005,842.10 0.42%
NASDAQ19,210.55 0.88%
NVDA1,184.22 2.41%
MSFT478.90 0.88%
GOOGL210.11 1.12%
META612.50 0.34%
AAPL239.80 0.21%
AMZN248.66 1.40%
AVGO1,902.40 3.12%
TSLA298.10 1.05%
BTC98,420 1.88%
ETH4,210 2.24%
10Y4.18% 0.02%
DXY104.12 0.18%
S&P 5005,842.10 0.42%
NASDAQ19,210.55 0.88%
NVDA1,184.22 2.41%
MSFT478.90 0.88%
GOOGL210.11 1.12%
META612.50 0.34%
AAPL239.80 0.21%
AMZN248.66 1.40%
AVGO1,902.40 3.12%
TSLA298.10 1.05%
BTC98,420 1.88%
ETH4,210 2.24%
10Y4.18% 0.02%
DXY104.12 0.18%
Back to homepage
AI & Finance

Wall Street’s Secret AI Fuel: Data, Not Just Models

Firms are paying top dollar for proprietary consumer and transaction data to train trading AIs — and that advantage could reshape winners, losers, and regulation.

P
Pedro Marini
May 30, 2026 · 3 min read
Wall Street’s Secret AI Fuel: Data, Not Just Models

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article
AI narration · ~3 min
Tickers mentioned
NVDA+4.20%MSFT+1.50%JPM+0.80%PLTR+2.00%PYPL+0.60%

Lead

Big models grab the headlines. In trading rooms and fintech war rooms, the talk is different: it isn’t only about architecture or parameter counts. It’s about the data that feeds those models. Wall Street has quietly shifted from buying off-the-shelf models to buying sources — transaction histories, merchant receipts, app telemetry, and narrow vertical feeds — and that shift changes the calculus for returns, concentration, and regulatory exposure.

Why this matters now

  • Compute costs have dropped and cloud access is everywhere, so simply scaling models buys you less than it used to. Proprietary, clean, richly labeled data is the real bottleneck.
  • Firms that already own granular user behavior — large banks, brokerages, payments platforms — can train models that see patterns others cannot.
  • That helps explain the recent rush into data partnerships, licensing deals, and acquisitions of niche data vendors.

A long game, not a quick trick

Think of data like crude oil in the 19th century: raw, indispensable, and often messy. But unlike oil, bad data doesn’t just sit around — it decays if you don’t curate it. A hedge fund that acquires a pile of receipts still faces cleaning, labeling, privacy engineering, and integration. Expensive. Slow. Frequently underestimated when hype takes over.

Who stands to gain

  • Incumbent financial institutions with direct customer relationships and transaction rails. Their datasets are proprietary and historically deep.
  • Cloud and AI infrastructure providers, who see higher demand to train and host these models.
  • Specialist data vendors and vertical marketplaces — now prime targets for funds and strategics hunting unique signals.

There’s nuance here: owning data isn’t an instant moat unless you keep investing in quality and rights management.

Counterpoints and fragilities

  • More data is not a guaranteed path to alpha. Noisy or biased feeds can produce fragile models that break when regimes shift.
  • Regulatory risk is increasing. Privacy enforcement, consumer-protection rules, and limits on resale of transaction-level data could undercut business models that treat personal finance signals as commoditized assets.
  • Concentration risk matters. If a few players hoard superior feeds, market liquidity and competitive dynamics could change in ways nobody fully anticipates.

Concrete examples (public behavior, not confidential claims)

  • Quant shops have long used alternative datasets — satellite imagery, aggregated credit-card flows, shipping logs — to complement price signals. That same playbook is being applied now to personalization and credit scoring for lenders and wealth managers.
  • Fintechs with bank-permissioned transaction streams are experimenting with on-device and hybrid models that try to preserve privacy while still extracting predictive features.

Not flashy, but effective in many cases.

What investors and risk managers should watch

  • Signs of consolidation: big licensing deals, strategic data buyouts, and exclusive supply agreements.
  • Regulatory moves on data portability, consumer consent, and algorithmic accountability. Even nonbinding guidance can change valuations quickly.
  • Metrics that matter beyond top-line growth: customer churn tied to data portability, cost of acquiring data versus marginal revenue, and how often models must be refreshed to remain predictive.

A few of these metrics tend to reveal the true durability of a data advantage.

The test is simple: who controls the inputs?

AI-driven trading and fintech products are entering a phase where ownership and curation of data matter as much as model design. Investors should look past slick model demos and ask who controls the inputs — and whether those inputs will still be available next year. Regulators face a delicate trade-off between enabling useful innovation and stopping opaque scraping or commercial resale of highly personal financial footprints.

Historically, edge in finance has gone to whoever controlled scarce, high-quality inputs. Today that input is permissioned, persistent data. That’s where the next decade’s winners will be carved out — or litigated over in courts and argued over in capitals.

Advertisement
Continue reading

Related coverage

The IMF Brief · Daily Newsletter

The AI economy, decoded before the open.

Five minutes. One email. The signal cutting through the noise at the intersection of artificial intelligence and Wall Street. Free, forever.

Join 184,000+ readers · No spam · Unsubscribe anytime