New York · 09:42 ESTMarkets Open

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

S&P 5005,842.10▲ 0.42%•

NASDAQ19,210.55▲ 0.88%•

NVDA1,184.22▲ 2.41%•

MSFT478.90▲ 0.88%•

GOOGL210.11▲ 1.12%•

META612.50▼ 0.34%•

AAPL239.80▲ 0.21%•

AMZN248.66▲ 1.40%•

AVGO1,902.40▲ 3.12%•

TSLA298.10▼ 1.05%•

BTC98,420▲ 1.88%•

ETH4,210▲ 2.24%•

10Y4.18%▼ 0.02%•

DXY104.12▲ 0.18%•

Back to homepage

AI & Finance

Wall Street’s Secret AI Fuel: Data, Not Just Models

Firms are paying top dollar for proprietary consumer and transaction data to train trading AIs — and that advantage could reshape winners, losers, and regulation.

Pedro Marini

May 30, 2026 · 3 min read

Wall Street’s Secret AI Fuel: Data, Not Just Models

Illustration by IMF Alpha editorial · Reviewed by Pedro Marini

Listen to this article

AI narration · ~3 min

Tickers mentioned

NVDA+4.20%MSFT+1.50%JPM+0.80%PLTR+2.00%PYPL+0.60%

Lead

Big models grab the headlines. In trading rooms and fintech war rooms, the talk is different: it isn’t only about architecture or parameter counts. It’s about the data that feeds those models. Wall Street has quietly shifted from buying off-the-shelf models to buying sources — transaction histories, merchant receipts, app telemetry, and narrow vertical feeds — and that shift changes the calculus for returns, concentration, and regulatory exposure.

Why this matters now

Compute costs have dropped and cloud access is everywhere, so simply scaling models buys you less than it used to. Proprietary, clean, richly labeled data is the real bottleneck.
Firms that already own granular user behavior — large banks, brokerages, payments platforms — can train models that see patterns others cannot.
That helps explain the recent rush into data partnerships, licensing deals, and acquisitions of niche data vendors.

A long game, not a quick trick

Think of data like crude oil in the 19th century: raw, indispensable, and often messy. But unlike oil, bad data doesn’t just sit around — it decays if you don’t curate it. A hedge fund that acquires a pile of receipts still faces cleaning, labeling, privacy engineering, and integration. Expensive. Slow. Frequently underestimated when hype takes over.

Who stands to gain

Incumbent financial institutions with direct customer relationships and transaction rails. Their datasets are proprietary and historically deep.
Cloud and AI infrastructure providers, who see higher demand to train and host these models.
Specialist data vendors and vertical marketplaces — now prime targets for funds and strategics hunting unique signals.

There’s nuance here: owning data isn’t an instant moat unless you keep investing in quality and rights management.

Counterpoints and fragilities

More data is not a guaranteed path to alpha. Noisy or biased feeds can produce fragile models that break when regimes shift.
Regulatory risk is increasing. Privacy enforcement, consumer-protection rules, and limits on resale of transaction-level data could undercut business models that treat personal finance signals as commoditized assets.
Concentration risk matters. If a few players hoard superior feeds, market liquidity and competitive dynamics could change in ways nobody fully anticipates.

Concrete examples (public behavior, not confidential claims)

Quant shops have long used alternative datasets — satellite imagery, aggregated credit-card flows, shipping logs — to complement price signals. That same playbook is being applied now to personalization and credit scoring for lenders and wealth managers.
Fintechs with bank-permissioned transaction streams are experimenting with on-device and hybrid models that try to preserve privacy while still extracting predictive features.

Not flashy, but effective in many cases.

What investors and risk managers should watch

Signs of consolidation: big licensing deals, strategic data buyouts, and exclusive supply agreements.
Regulatory moves on data portability, consumer consent, and algorithmic accountability. Even nonbinding guidance can change valuations quickly.
Metrics that matter beyond top-line growth: customer churn tied to data portability, cost of acquiring data versus marginal revenue, and how often models must be refreshed to remain predictive.

A few of these metrics tend to reveal the true durability of a data advantage.

The test is simple: who controls the inputs?

AI-driven trading and fintech products are entering a phase where ownership and curation of data matter as much as model design. Investors should look past slick model demos and ask who controls the inputs — and whether those inputs will still be available next year. Regulators face a delicate trade-off between enabling useful innovation and stopping opaque scraping or commercial resale of highly personal financial footprints.

Historically, edge in finance has gone to whoever controlled scarce, high-quality inputs. Today that input is permissioned, persistent data. That’s where the next decade’s winners will be carved out — or litigated over in courts and argued over in capitals.

Related coverage

News· 5 min

SEC, CFTC Eye AI in Trading, Disclosure: A Regulatory Balancing Act

Both the Securities and Exchange Commission and the Commodity Futures Trading Commission are actively scrutinizing the accelerating integration of artificial intelligence into financial markets, focusing on risk management, market integrity, and transparency.

By IMF Alpharoom AI

News· 5 min

Nvidia’s AI Chip Dominance Fueled by Hyperscaler Capital Expenditures

Strong demand for advanced AI accelerators, particularly from major cloud providers, continues to drive Nvidia's revenue growth, despite anticipated moderation in capex.

By IMF Alpharoom AI

News· 4 min

Wall Street's New Gold: How Synthetic Data Is Powering Financial AI — and What Could Go Wrong

Banks and fintechs are racing to replace fragile real-world datasets with synthetic alternatives. That promises speed and privacy, but also new biases, regulatory headaches, and systemic risk.

By Pedro Marini

Wall Street’s Secret AI Fuel: Data, Not Just Models

Related coverage

SEC, CFTC Eye AI in Trading, Disclosure: A Regulatory Balancing Act

Nvidia’s AI Chip Dominance Fueled by Hyperscaler Capital Expenditures

Wall Street's New Gold: How Synthetic Data Is Powering Financial AI — and What Could Go Wrong

The AI economy, decoded before the open.