Research
01 / The Inference Economy22 APR 2026

The Economics of AI Inference

The fastest-growing line item on corporate P&Ls has no price discovery, no benchmarks, and no forward curve.

22 min read


AI inference is rapidly becoming one of the largest unhedged variable costs in corporate finance, and no financial instrument exists to manage it. For CFOs and corporate treasurers who spent the last decade building sophisticated hedging programs for currencies, commodities, and interest rates, this is a blind spot the size of a balance sheet.

Token prices, the per-unit cost of using large language models, have collapsed by 99.9% in five years, falling from $60 per million tokens at GPT-3's launch in 2020 to $0.05 today. Yet that deflationary miracle masks a volatile, subsidy-dependent, geopolitically fragile cost structure that could snap back without warning. Every previous commodity with these characteristics, weather, electricity, carbon, bandwidth, eventually spawned a derivatives market. AI compute will be next. The only question is when.

This is the first installment of a three-part series examining AI's emerging cost architecture through the lens of corporate finance. Part one maps the terrain: how token pricing works, why prices have fallen so dramatically, why that trend cannot be extrapolated linearly, and what historical precedents tell us about what comes next.

Let’s get into it.


The anatomy of a token: how AI's unit economics actually work

Every interaction with a large language model, every customer service chatbot response, every contract reviewed, every financial forecast generated, is metered in tokens. One token equals roughly four English characters or on average, three-quarters of a word. The industry bills per million tokens (MTok), with a critical asymmetry: output tokens cost 3–5x more than input tokens because generating text requires sequential, compute-intensive processing while reading prompts can be parallelized.¹

The major AI labs (OpenAI, Anthropic, Google, etc.) have standardized around a three-tier pricing structure that mirrors how other commodity markets grade product quality. OpenAI's lineup in Q1 2026 spans from GPT-5.2 Pro at $21/$168 per MTok (input/output), a premium reasoning engine, down to GPT-5 nano at $0.05/$0.40. Anthropic ranges from Claude Opus 4.6 at $5/$25 to Claude Haiku 4.5 at $1/$5. Google's Gemini stretches from its 3.1 Pro at $2/$12 to Flash-Lite at $0.075/$0.30. And then there's DeepSeek, the Chinese lab that shattered pricing conventions: its V3.2 model charges $0.28/$0.42, with cached input tokens dropping to an almost absurd $0.028 per million.²

Today, three pricing mechanisms function as proto-financial instruments. 1) Batch processing, accepting a 24-hour delivery window, earns a 50% discount across all major providers. 2) Prompt caching, reusing repeated input sequences, delivers up to 90% savings. And 3) reserved capacity commitments from cloud providers lock in GPU access at 30–75% discounts for one- to three-year terms.³ These are, in effect, forward contracts without the financial infrastructure: no secondary market, no price discovery mechanism, no transferability. Corporate treasurers would recognize them immediately as commitments with embedded optionality, but with none of the hedging tools that would normally accompany such exposure.


A 1,000x collapse in three years: a spectacular price decline in an emerging commodity

The trajectory of token prices is not merely deflationary. It is historically unprecedented in its velocity. Faster than Moore's Law, faster than bandwidth during the dot-com boom, faster than storage costs during the cloud era.

When OpenAI launched GPT-3's Davinci model in June 2020, API access cost $60 per million tokens, a unified rate for both input and output.⁴ By August 2022, a price cut brought it to $20.⁵ Then GPT-3.5-Turbo arrived in March 2023 at $2 per million tokens, a 97% drop that made AI economically viable for production applications for the first time. The same month, GPT-4 launched at the premium end: $30 input / $60 output per MTok, marking the start of the dual-track pricing era.⁶

What followed was a relentless staircase downward. GPT-4 Turbo (November 2023) cut input costs 67% and output costs 50%. GPT-4o (May 2024) halved prices again. GPT-4o mini (July 2024) delivered frontier-adjacent quality at $0.15/$0.60, a massive 99.5% decline from GPT-4's launch price just sixteen months earlier.⁷ Andrew Ng, co-founder of Google Brain and founder of DeepLearning.AI, calculated the GPT-4 to GPT-4o transition represented a price decline of roughly 79% per year — or 87% using batch pricing.⁸

Anthropic's trajectory tells a subtler story. While its mid-tier Sonnet line has held remarkably stable at $3/$15 across multiple generations (with each generation delivering substantially better performance at the same price), the flagship Opus line dropped from $15/$75 (Claude 3 through 4.1) to $5/$25 with Opus 4.5 in November 2025, a 67% reduction while delivering meaningfully superior capabilities.⁹

Andreessen Horowitz published the definitive analysis of this trend in November 2024 under the memorable title "LLMflation." Investor Guido Appenzeller documented that for an LLM of equivalent performance, cost decreases by roughly 10x every year. GPT-3-quality inference fell from $60 per million tokens in November 2021 to $0.06 by 2024, a 1,000x decline in three years.¹⁰

Epoch AI corroborated this with granular analysis across six benchmarks in March 2025, finding decline rates ranging from 9x to 900x per year depending on the capability measured, with a median of roughly 50x per year. Their most striking finding: the price to achieve GPT-4's performance on PhD-level science questions fell by 40x per year.¹¹ Stanford's 2025 AI Index Report, widely cited by the Federal Reserve and Brookings Institution, documented a 280-fold drop from November 2022 to October 2024 for GPT-3.5-level performance.¹²

An academic paper from OpenRouter researchers, based on more than 100 trillion tokens served through their platform, revealed an important paradox: while the cost of any given intelligence level has plummeted 1,000x since 2023, the average price paid per token has remained relatively constant because buyers consistently upgrade to the most capable available model.¹³

Cheaper tokens don't reduce bills. They raise expectations.


Four engines of deflation and why each has limits

The collapse in token prices is not a single phenomenon but the convergence of four distinct forces, each with its own trajectory and ceiling.

1. Subsidization at unprecedented scale. OpenAI reported $5 billion in losses on $3.7 billion in revenue in 2024.¹⁴ Deutsche Bank estimates the company will accumulate roughly $143 billion in cumulative negative free cash flow between 2024 and 2029 before reaching profitability, prompting analysts to note that "no startup in history has operated with losses on anything approaching this scale."¹⁵ OpenAI projects $14 billion in losses for 2026 alone and $44 billion cumulatively from 2023 through 2028, targeting $100 billion in revenue and profitability by 2030.¹⁶ Google spends an estimated $5 billion annually on Gemini operations while pricing Flash models below $1 per million tokens. DeepSeek, which triggered a $600 billion single-day wipeout in NVIDIA's market cap when it launched R1 in January 2025, prices its V3.2 model at roughly one-tenth the cost of GPT-5.¹⁷

This is a land grab, not a market. And land grabs end.

2. Hardware improvements approaching a Moore's Law equivalent. NVIDIA's GPU roadmap delivers compounding gains: the B200 (Blackwell) provides roughly 4x throughput over the H100 at modest price premiums using native FP4 precision.¹⁸ The B300 (Blackwell Ultra) delivers 11–15x faster LLM throughput per GPU versus the Hopper generation. Cloud H100 rental prices have fallen 64–75% from peak, stabilizing at $2.49–$3.50 per hour from highs of $8–$10.¹⁹ Custom silicon accelerates the trend: Google's TPU v6 (Trillium) achieves 4.7x peak compute over its predecessor;²⁰ Amazon's Trainium2 claims 54% lower cost per token than A100 clusters;²¹ Midjourney's migration from NVIDIA GPUs to Google TPUs cut their monthly compute spend from $2.1 million to under $700,000 — annualized savings of $16.8 million.²² But each hardware generation requires years of design, fabrication, and deployment.

The gains are real but not instantaneous.

3. Inference optimization techniques that squeeze more from existing silicon. Quantization, compressing model weights from 16-bit to 4-bit precision, delivers 60–70% cost reduction with minimal quality loss.²³ Mixture of Experts architectures, exemplified by DeepSeek-V3.2's design of 671 billion total parameters with only 37 billion activated per token, provide frontier-class capability at a fraction of dense model cost.²⁴ Speculative decoding accelerates inference 2–3x without additional hardware. PagedAttention reduces memory waste by 55% and enables 10x more concurrent users on the same hardware.²⁵ Roblox combines 2-bit quantization with speculative decoding, accepting 5% quality degradation for 90% cost reduction while serving 100 billion tokens daily.²⁶ These techniques are powerful but increasingly subject to diminishing returns.

You can only quantize to zero once.

4. Open-source competition establishing price ceilings. The OpenRouter marketplace has grown from roughly 60 models in early 2024 to over 650 by December 2025, with 434 open-source models versus 217 closed-source. Open-source models are approximately 90% cheaper than closed-source alternatives at equivalent intelligence levels.²⁷ Meta's decision to open-source Llama created structural price pressure that Sam Altman acknowledged directly: "It was clear that if we didn't do it, the world was gonna be mostly built on Chinese open-source models."²⁸ The proliferation is real, but maintaining and improving open-source models still requires billions in investment.

Meta didn't release Llama out of charity.


The volatility thesis: six reasons token prices will not decline forever

For a CFO constructing a three-year operating budget, the deflationary narrative is dangerously seductive. It assumes that current forces will continue unabated.

They cannot. They will not.

At least six structural factors create conditions for significant volatility, including potentially sharp price reversals.

1. The subsidy cliff. OpenAI has raised $57.9 billion across eleven funding rounds, including a record $40 billion raise in March 2025. Its transition to a for-profit public benefit corporation in October 2025 signals mounting investor pressure for returns. With gross margins of only ~40%, constrained by variable compute costs and far below typical software margins, the path to profitability assuredly involves raising prices, reducing subsidies, or both.²⁹

When ride-sharing and meal delivery companies like Uber and Door Dash reached profitability, prices rose significantly. But having already displaced the incumbent services, both business have continued to grow. The ship had already sailed, the train had left the station, and there was no putting it back in the box.

The same dynamic inevitably applies to AI inference. The question is not whether, but when, and how abruptly.

2. Energy costs are structural and rising. Global data center electricity consumption reached 415 TWh in 2024, roughly 1.5% of global electricity demand.³⁰ The International Energy Agency projects this will more than double to ~945 TWh by 2030.³¹ U.S. data centers alone consumed 183 TWh in 2024, exceeding 4% of total U.S. electricity, roughly equivalent to Pakistan's entire annual demand.³² In Virginia, data centers consumed 26% of the state's total electricity supply in 2023.³³ Carnegie Mellon estimates data centers could drive an 8% increase in average U.S. electricity bills by 2030, exceeding 35% in high-demand markets like northern Virginia.³⁴ The PJM electricity market (spanning Illinois to North Carolina) saw data centers cause an estimated $9.3 billion increase in its 2025–26 capacity market. Big Tech sees the bottleneck and has responded. Microsoft's $16 billion, 20-year deal to restart Three Mile Island, Google's agreement with Kairos Power for 500 MW from six to seven small modular reactors, Amazon's $20 billion+ investment converting nuclear sites to AI campuses, Meta's RFP for 1–4 GW of new nuclear generation all confirm that energy is a binding constraint, not a solved problem.³⁵

These are 20-year commitments. It's clear that AI energy demand isn't cyclical or transitory, it's structural.

3. Production has a single point of failure. Taiwan Semiconductor Manufacturing Company (TSMC) commands approximately 60–70% of global foundry market share and is the sole company capable of mass-producing at 3nm and below. TSMC controls more than 90% of advanced AI chip production, but Broadcom confirmed in 2026 that TSMC is "hitting production capacity limits."³⁶ CoWoS advanced packaging is fully booked even at 90,000 wafers per month. HBM3 memory pricing has risen 20–30% year-over-year with six- to twelve-month lead times.³⁷ NVIDIA's Blackwell GPUs are "already backlogged for a year or more." TSMC's advanced-node capacity in the United States will likely remain under 10% of total by 2030.

Any disruption, geopolitical, seismic, or otherwise, to TSMC's Taiwan operations would create an immediate supply crisis for global AI infrastructure.

4. Geopolitical fragmentation threatens supply. U.S. export controls have prevented NVIDIA from selling advanced chips to China, forcing DeepSeek to delay its R2 model due to challenges training on Huawei chips.³⁸ The U.S. informed TSMC that its Nanjing fab export waiver would expire at end of 2025. NVIDIA and AMD received licenses for China-specific chips only by agreeing to hand over 15% of revenue from those sales. The semiconductor supply chain concentrates across just three countries: the United States (design), the Netherlands (lithography), and Taiwan (fabrication).³⁹

The bottlenecks and extreme regionalization have created what analysts call an "extraordinarily vulnerable supply chain."⁴⁰

5. Regulation adds cost. The EU AI Act's high-risk system obligations take full effect August 2, 2026, with penalties up to €35 million or 7% of global revenue. European Commission studies estimate roughly 17% overhead on AI spending for high-risk systems.⁴¹ Large enterprises face $8–15 million in initial compliance investment. As much as 18–50% of deployed enterprise AI systems could fall under high-risk classification, far above the Commission's initial 5–15% estimate.⁴²

Increased compliance costs will inevitably, eventually, be passed through to customers as a component of token pricing.

6. Agentic AI is a token multiplier, not a token saver. Gartner predicts that by 2030, inference costs for one-trillion-parameter LLMs will drop 90%-plus, but agentic models require 5–30x more tokens per task than standard chatbots.⁴³ As token consumption rises faster than costs fall, overall inference costs increase. IDC projects more than one billion actively deployed AI agents worldwide by 2029, executing over 217 billion actions per day and consuming 3.7 TeraTokens daily — with token delivery costs surpassing $68 billion annually.⁴⁴

VentureBeat's assessment is blunt: "Inference costs have fallen 1,000-fold, but demand has risen 10,000-fold."

This is Jevons Paradox applied to intelligence.


AI spend is becoming a top-five corporate line item

The abstraction of token pricing obscures a concrete reality: AI is becoming one of the fastest-growing operating expenses in corporate finance. Gartner forecasts worldwide AI spending will reach $2.52 trillion in 2026, roughly 41% of total worldwide IT spending.⁴⁵ BCG reports companies plan to double AI spending in 2026, lifting it to about 1.7% of revenues. Deloitte's 2025 Tech Executive Survey found AI's percentage of technology budgets is expected to rise from 8% to 13% within two years.⁴⁶ KPMG's AI Quarterly Pulse Survey tracked average AI investment climbing from $114 million in Q1 2025 to $130 million by Q3 2025, a 10% increase in just six months.⁴⁷

The critical structural insight for treasurers is that 80–90% of enterprise AI costs are inference, not training.

Training is a one-time capital expenditure. Inference, actually running models in production, is an ongoing, variable operating cost that scales with business success.⁴⁸ CloudZero's State of AI Costs Report found average monthly AI costs hit $85,521 in 2025, up 36% year-over-year. Monthly cost fluctuations of 30–40% are common for AI workloads on hyperscalers.⁴⁹ Gartner has warned that companies scaling AI face cost estimation errors of 500–1,000%.⁵⁰

The hyperscalers are betting the house.

Amazon's projected 2026 AI capex is $200 billion. Alphabet: $175–185 billion. Meta: $115–135 billion. Microsoft: $120 billion-plus. Oracle: $50 billion. Combined, the Big Five plan to spend approximately $660–690 billion in capital expenditure in 2026, with roughly 75% (some $450 billion) directly tied to AI infrastructure.⁵¹

This spending will consume an estimated 94% of their operating cash flows (minus dividends and buybacks), up from 76% in 2024. Big tech companies issued $100 billion in bonds in early 2026 to fund AI capex, with Alphabet alone holding a $25 billion bond sale that quadrupled its long-term debt.⁵² JPMorgan's analysis calculated that to drive a 10% return on modeled AI investments through 2030 would require roughly $650 billion of annual revenue in perpetuity.⁵³

On earnings calls, AI has become the dominant theme. FactSet data shows 306 S&P 500 companies cited "AI" on Q3 2025 earnings calls, nearly triple the five-year average of 136.⁵⁴ JPMorgan Chase raised its technology budget to approximately $20 billion for 2026; CFO Jeremy Barnum told analysts that managers have been instructed to avoid hiring as the bank deploys AI across business lines. Goldman Sachs CEO David Solomon announced an AI-driven operating model overhaul focused on "front-to-back work streams that can significantly benefit from AI-driven process reengineering."⁵⁵ Yet PwC's 2026 Global CEO Survey reveals a sobering counterpoint: 56% of CEOs report neither increased revenue nor decreased costs from AI. Only 12% reported both.⁵⁶ The gap between investment and realized returns is already enormous, and the bills are just beginning to arrive.


When weather became a commodity and four other markets that didn't exist until they had to

The argument that AI compute will eventually trade like a commodity is not speculative, it is historically inevitable. Every major cost input that exhibits volatility, cannot be stored, and lacks hedging instruments eventually spawns a derivatives market. The pattern has repeated at least five times in recent memory.

1. Weather derivatives emerged from a risk that was simply accepted. In September 1997, Enron and Koch Energy executed the first publicized weather derivative, a degree-day swap based on Milwaukee winter temperatures.⁵⁷ The catalyst was deregulation: energy companies could no longer pass weather-related demand volatility to customers. A strong El Niño that winter made weather's economic impact front-page news. The U.S. Department of Energy estimated one-seventh of the economy was directly weather-dependent. The CME launched exchange-traded weather futures in September 1999.⁵⁸ By 2005–2006, the market peaked at approximately $45 billion in notional value. In 2023, CME weather derivative trading volumes surged over 260% year-over-year.⁵⁹

A risk that was invisible became a multi-billion-dollar asset class in under a decade.

2. Electricity markets created hedgers out of price-takers. The UK's Electricity Pool launched in April 1990. NYMEX listed the first electricity futures in March 1996. Nord Pool became the first multinational electricity exchange in 1996.⁶⁰ Then California's 2000–2001 crisis during which prices spiking 17x from ~3¢/kWh to 50¢/kWh, proved why hedging was essential.⁶¹ Today, corporate Power Purchase Agreements alone represent a $28–31 billion market, projected to reach $97–110 billion by 2033, growing 33% annually since 2015. Amazon alone has 34 GW contracted.⁶²

The critical parallel to AI compute: electricity cannot be economically stored and must be consumed in real time, just like AI inference.

3. Bandwidth trading failed on execution, not concept. When Enron launched bandwidth trading in late 1999, the idea was sound: treat telecom capacity as a tradeable commodity.⁶³ Band-X, RateXchange, and Arbinet attempted the same. It collapsed because only 5% of 40 million miles of fiber were active, completing a bandwidth deal took one to six months due to lack of standardization, and Enron's fraud destroyed market trust.⁶⁴ Former Enron employees now observe that AI infrastructure buildout is "very, very reminiscent of the telecoms boom."⁶⁵

The lesson is not that the concept was wrong, it was premature.

Modern cloud infrastructure has solved the standardization and delivery problems that killed bandwidth trading.

4. Carbon markets priced the unpriced. The EU Emissions Trading System launched in January 2005, covering roughly 40% of EU CO₂ emissions.⁶⁶ Phase 1 was messy. Prices crashed to near zero when over-allocation was revealed. But the market learned. By 2023, EU carbon permits hit a record €98.5, the system generated €43.6 billion in revenue in a single year, and cumulative auction revenue from 2013 to 2025 exceeded €245 billion.⁶⁷

Carbon went from an unmeasured externality to a tradeable commodity with sophisticated futures, options, and corporate hedging strategies.

5. Cloud FinOps proved that compute cost management follows compute adoption. AWS launched EC2 in August 2006. Reserved Instances, essentially forward contracts for compute, arrived in October 2008, offering up to 72–75% discounts for one- to three-year commitments.⁶⁸ Spot Instances followed in December 2009 as an early attempt at market-based compute pricing.⁶⁹ The FinOps Foundation launched in February 2019 with 30 members. Today it represents a 96,000-person community across 15,000+ companies, including 93 of the Fortune 100.⁷⁰ Organizations practicing FinOps save an estimated 20–30% on cloud expenditure.

Cloud computing went from zero to the largest IT line item in under fifteen years. AI is on the same trajectory, but sped up.


The first compute futures exchanges are already being built

The financialization of AI compute is not a theoretical projection. It is happening now.

Ornn AI Inc., founded in 2025 by former quantitative traders and hardware engineers, raised a $5.7 million seed round in October 2025 to build the world's first regulated compute futures exchange.⁷¹ The company has developed the Ornn Compute Price Index (OCPI), tracking live spot prices for GPU compute across H100, H200, B200, and other hardware types, and the index is now listed on Bloomberg terminals.⁷² Ornn's futures contracts are designed as cash-settled, Asian-style instruments (settled on the arithmetic average of daily index values, mirroring electricity futures design) under CFTC-aligned standards.⁷³

Architect Financial Technologies, a FINRA-registered broker-dealer, announced a partnership with Ornn to launch exchange-traded perpetual futures contracts on GPU and DRAM prices. Brett Harrison, Architect's CEO, stated plainly: "There is an urgent cross-industry need to establish standardized derivatives contracts and centralized order books for compute."⁷⁴

CoreWeave's trajectory reinforces the asset-class thesis. The company holds $18.8 billion in GPU-collateralized debt, using H100s and successor GPUs as collateral through special purpose vehicles, treating GPUs as financeable assets. Its contracted backlog reached $66.8 billion as of December 2025, with OpenAI committing $22.4 billion and Meta $14.2 billion.⁷⁵ Dave Friedman, an analyst who has compared the current moment to "where oil futures were in the early 1980s," describes CoreWeave as looking "closer to a 1990s independent power producer: a leveraged infrastructure vehicle that buys assets with other people's money and leases them back under long-term contracts." The absence of a forward curve for GPU compute, he notes, created "one of the most expensive financing structures in tech history."⁷⁶

Academic work is emerging in parallel. A March 2025 paper on arXiv titled "AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design" proposes standardized token futures contracts including a "Standard Inference Token" definition, settlement mechanisms, and margin systems using mean-reverting jump-diffusion stochastic models borrowed from electricity market design. The paper explicitly compares tokens to electricity and carbon emission allowances.⁷⁷ A separate proposal for a blockchain-based Global Compute Exchange appeared in mid-2024.⁷⁸

The FinOps Foundation's AI working group has published frameworks for AI cost forecasting, generative AI cost tracking, and optimization with contributions from practitioners at Wayfair, General Mills, Dell Technologies, and others.⁷⁹

Multiple startups now compete in AI-specific cost management: CloudZero (whose case study with a global SaaS platform serving 40 million users delivered $1 million-plus in immediate savings), Cast AI (offering an LLM proxy that routes queries to cost-optimal models), Mavvrik (claiming end-to-end AI cost governance), and others.⁸⁰

Kush Bavaria, Ornn's CEO, frames the thesis concisely: "Compute is rapidly becoming the defining commodity of the AI era."⁸¹ Meltem Demirors of Crucible Capital, an Ornn investor, provides the scale: "The AI data center boom is the largest infrastructure build in human history, with $4 trillion booked to be spent by 2030."⁸²

If compute derivatives markets develop multipliers comparable to oil or agricultural markets, where derivatives trade at 10–15x the value of the physical market, the potential compute derivatives market could reach $5 trillion.⁸³


Conclusion: the $2.5 trillion line item nobody can hedge

The pattern is unmistakable. A new cost category emerges. It's unpredictable, material, growing, and not directly manageable with existing instruments. Corporations absorb it passively until the magnitude forces action. Bilateral deals appear. Standardized indices follow. Eventually, exchange-traded derivatives create the price discovery, liquidity, and risk transfer mechanisms that transform an unmanaged expense into a manageable one.

AI compute is somewhere between weather derivatives and electricity markets in the 1990's and early 2000’s. The risk is recognized, the first infrastructure is being built, but the institutional frameworks remain embryonic.

The critical difference is speed. Weather derivatives took roughly seven years from concept to exchange-traded products. Cloud FinOps took thirteen years from AWS launch to FinOps Foundation. The compute derivatives market, with Bloomberg-listed indices, CFTC-aligned contract design, and institutional participants already in place, looks like it will compress that timeline to two to three years.

However, while compute cost management has arrived, there's still no way to manage volatility in the price of AI inference. Today, the lack of a market is irrelevant given the AI labs heavy subsidization of token costs to drive adoption. But, as explored above, it's not a matter of if, but when price will inevitably become volatile in both directions.

For CFOs building multi-year budgets, three implications are immediate:

Inference costs should be modeled as volatile, not as a smoothly declining curve. The subsidy cliff, energy constraints, and agentic demand multipliers create significant tail risks that neat trendlines ignore.

AI cost governance needs treasury-grade discipline now, not after the bills become unmanageable; the FinOps-for-AI movement is exactly where cloud FinOps was circa 2015, and early movers will capture structural advantages.

Watch the compute derivatives space closely. Adopting hedging instruments early, will position these companies to operate under the new paradigm in which they'll avoid being price-takers in a market designed for price-makers.

The AI token is the new oil barrel.

The question is no longer whether a financial market will emerge around it, but whether your organization will be ready when it does.


Sources

  1. Iternal Technologies, "LLM API Pricing Calculator for Enterprise Deployment in 2026"; FinOps Foundation, "How to Build a Generative AI Cost and Usage Tracker"; Introl, "Cost Per Token Analysis: Optimizing GPU Infrastructure."
  2. IntuitionLabs, "LLM API Pricing Comparison (2025)"; Awesome Agents, "LLM API Pricing Comparison — March 2026"; AI Magicx, "LLM API Pricing in 2026."
  3. byteiota, "Cloud Pricing Wars 2025: AWS, Azure, GCP Cost Comparison."
  4. Mem0, "LLM Pricing Timeline."
  5. The Decoder, "OpenAI cuts prices for GPT-3 by two thirds."
  6. Nebuly, "OpenAI GPT-4 API Pricing."
  7. IntuitionLabs, ibid.; Mem0, ibid.
  8. DeepLearning.AI, "Falling LLM Token Prices and What They Mean for AI Companies."
  9. Finout, "Anthropic API Pricing: Complete Guide and Cost Optimization Strategies (2025)"; IntuitionLabs, ibid.
  10. Andreessen Horowitz, "Welcome to LLMflation — LLM inference cost is going down fast," November 2024.
  11. Epoch AI, "LLM inference prices have fallen rapidly but unequally across tasks," March 2025.
  12. Libertify, "Stanford AI Index Report 2025: Key Findings & What They Mean for Business."
  13. Fradkin et al., "The Emerging Market for Intelligence: Pricing, Supply, and Demand for LLMs," December 2025.
  14. TapTwice Digital, "8 OpenAI Statistics (2025): Revenue, Valuation, Profit, Funding."
  15. eMarketer, "OpenAI's forecast $143 billion cash outflow raises stakes."
  16. Yahoo Finance, "OpenAI's own forecast predicts $14 billion loss in 2026 but Nvidia-style $100 billion revenues by 2029."
  17. CNBC, "Why DeepSeek didn't cause an investor frenzy again in 2025"; Fortune, "China's DeepSeek just dropped a new GPT-5 rival," August 21, 2025.
  18. Spheron, "NVIDIA B200 Guide: Specs, Benchmarks, Cloud Pricing & H100 Upgrade"; Exxact Corp, "Comparing Blackwell vs Hopper."
  19. Introl, "Inference Unit Economics: The True Cost Per Million Tokens"; Jarvislabs, "NVIDIA H100 Price Guide 2026."
  20. HPCwire, "Google Announces Sixth-generation AI Chip, a TPU Called Trillium," May 17, 2024.
  21. Cloudexpat, "Cloud AI Platforms Comparison: AWS Trainium vs Google TPU v5e vs Azure ND H100."
  22. AI News Hub, "Nvidia to Google TPU Migration 2025: The $6.32B Inference Cost Crisis."
  23. Introl, "Cost Per Token Analysis: Optimizing GPU Infrastructure."
  24. Introl, "DeepSeek-V3.2 Matches GPT-5 at 10x Lower Cost."
  25. Introl, "Inference Unit Economics," ibid.
  26. Tamingllms, "The Falling Cost Paradox."
  27. Fradkin et al., ibid.
  28. Tech Policy Press, "Taking AI Commoditization Seriously."
  29. Sacra, "OpenAI revenue, valuation & funding."
  30. IEA, "Energy demand from AI — Energy and AI — Analysis."
  31. IEA, ibid.
  32. Pew Research Center, "US data centers' energy use amid the artificial intelligence boom," October 24, 2025.
  33. Pew Research Center, ibid.
  34. Pew Research Center, ibid.
  35. Introl, "Nuclear power for AI: inside the data center energy deals."
  36. Sparknify, "Inside the AI Chip Race: Why the World Still Runs on TSMC"; Capacity, "AI chip demand continues to strain big tech supply chains."
  37. Sourceability, "AI demand sparks memory supply chain strain."
  38. Medium, "The AI Chips Supply Chain Incredible Fragility."
  39. Medium, ibid.; Sparknify, ibid.
  40. Medium, ibid.
  41. Boundless, "What is the EU AI Act? Employer compliance guide."
  42. Medium, "The EU AI Act's Hidden Market: How High-Risk AI Compliance Became a €17 Billion Opportunity."
  43. Gartner, "Gartner Predicts That by 2030, Performing Inference on an LLM With 1 Trillion Parameters Will Cost GenAI Providers Over 90% Less Than in 2025," March 25, 2026; IT-Online, "LLMs will be 100 times more cost-efficient by 2030."
  44. IDC, "Agent Adoption: The IT Industry's Next Great Inflection Point"; FinancialContent, "AI Agents Surge in 2026 Boom."
  45. Process Excellence Network, "Global AI spending will total $2.5 trillion in 2026, says Gartner"; Splunk, "2026 IT Spending and Budget Forecasts."
  46. DailyAIWire, "Shocking Deloitte AI survey: CFO IT leaders Are Clashing Over Billion Dollar Investments 2026"; Deloitte Insights, "The great rebuild: Architecting an AI-native tech organization."
  47. KPMG, "AI Quarterly Pulse Survey," 2025.
  48. Ankur's Newsletter, "The Real Price of AI: Pre-Training Vs. Inference Costs"; Tonygraysonvet, "Training vs. Inference: The $300B AI Shift Everyone is Missing."
  49. OpenMetal, "FinOps for AI Gets Easier with Fixed Monthly Infrastructure Costs"; Medium, "The Real Cost of AI Compute: Training vs. Inference."
  50. Gartner Infrastructure Research, cited in ThinkAI Corp, "AI Inference: A Hidden Cost Crisis," January 13, 2026.
  51. Futurum Group, "AI Capex 2026: The $690B Infrastructure Sprint"; IEEE ComSoc, "Hyperscaler capex > $600 bn in 2026."
  52. CNBC, "Tech AI spending may approach $700 billion this year, but the blow to cash raises red flags," February 6, 2026.
  53. Tom's Hardware, "J.P. Morgan calls out AI spend, says $650 billion in annual revenue required to deliver mere 10% return on AI buildout."
  54. Fortune, "Earnings calls citing 'AI' surge in 2025 as 'uncertainty' mentions fade," December 15, 2025.
  55. CNBC, "Big banks like JPMorgan Chase and Goldman Sachs are already using AI to hire fewer people," October 15, 2025; Banking Dive, "Banks chase AI-fueled efficiencies."
  56. The Register, "Majority of CEOs report zero payoff from AI splurge," January 20, 2026.
  57. Barrieu, P. and Scaillet, O., "A primer on weather derivatives."
  58. Energy Intelligence, "Weather Becomes Newest Market for Energy Firms."
  59. Carbon Credits, "Weathering the Storm: The Rise of $25B Weather Derivatives Market."
  60. Oxford Institute for Energy Studies, "Why Did Electricity Prices Fall in England and Wales?"; ResearchGate, "The hedging performance of electricity futures on the Nordic Power Exchange."
  61. Wikipedia, "2000–2001 California electricity crisis."
  62. BloombergNEF, "Corporate Clean Power Buying Grew 12% to New Record in 2023."
  63. Satellite Today, "Satellite Bandwidth Trading: Will It Fly?," March 10, 2001; MIT Technology Review, "Bandwidth's New Bargaineers," November 1998.
  64. Data Center Dynamics, "The unmaking of Enron Broadband."
  65. Headcount Coffee, "The Enron Broadband Scandal: The Forgotten Half of Enron's Empire."
  66. Wikipedia, "European Union Emissions Trading System."
  67. Mordor Intelligence, "Carbon Credit Market Size, Share & 2030 Growth Trends Report."
  68. Flexera, "AWS Reserved Instances: Ultimate guide [2025]."
  69. Rackspace, "History of Spot Instances."
  70. FinOps Foundation, "About the FinOps Foundation."
  71. Pulse 2.0, "Ornn: $5.7 Million Seed Funding Raised For Launching Compute Futures Exchange," October 2025.
  72. The Innermost Loop, "The First Tradable Compute Price Index," April 2026.
  73. Ornn Research, "Compute Futures," 2026. Available at ornnai.com/research/compute-futures.
  74. PR Newswire, "Architect Financial Technologies Partners with Compute Index Provider Ornn to Launch Exchange-Traded Futures on GPU and RAM Prices," January 21, 2026.
  75. Sacra, "CoreWeave revenue, valuation & funding."
  76. Dave Friedman, "Compute is the Commodity No One Knows How to Price," Substack, February 4, 2026.
  77. arXiv, "AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design," March 2025.
  78. arXiv, "Commodification of Compute," June 2024.
  79. FinOps Foundation, "How to Forecast AI Services Costs in Cloud," 2025.
  80. CloudZero, "AI Cost Optimization At Scale"; Cast AI, "LLM Cost Optimization"; Mavvrik, "State of FinOps 2025."
  81. PR Newswire, ibid.
  82. PR Newswire, "Ornn Raises $5.7 Million Seed Round to Launch the World's First Compute Futures Exchange," October 28, 2025.
  83. Compute Exchange, "The $5 Trillion Opportunity: A Compute Futures Market."