Skip to main content
For a year, burning AI tokens became a proxy for ambition. Then the bills came due. A look at tokenmaxxing, Goodhart's Law, and the shift to token efficiency.

The Token Burning Bubble: Inside the Collapse of Silicon Valley's Most Expensive Metric

For a year, burning AI tokens became a proxy for ambition. Then the bills came due. A look at tokenmaxxing, Goodhart's Law, and the shift to token efficiency.

TL;DR

For about a year, a single number ran Silicon Valley: how many AI tokens you burned. Not what you shipped - what you spent. Engineers raced internal leaderboards, ran agents on nothing, and torched budgets to stay “safely above average.” At the peak, Meta engineers reportedly consumed 60 trillion tokens in a single month.

Then the invoices arrived. The leaderboards are coming down, the budgets are blown, and the metric everyone optimized for turned out to measure activity, not outcomes. Classic Goodhart’s Law. The next phase isn’t about burning more - it’s about output per token.

The ideology of “tokenmaxxing”

The thesis was simple and expensive: more tokens burned equals more productivity. In this framing, AI usage is the ultimate multiplier of human effort. Every prompt, every model response, every background agent task consumes tokens - the basic units of AI computation - so the logic went: spend more, produce more.

The industry’s loudest voices pushed it hard.

Jensen Huang, Nvidia CEO

A senior engineer earning $500,000 who doesn't burn at least $250,000 in tokens a year should be cause for "deep alarm" - token budgets ought to be part of the comp package.

Jensen Huang, Nvidia CEO. Photo by Peter Dasilva, CC BY 4.0, via Wikimedia Commons.

Tobi Lutke, Shopify CEO

Adopt AI tools, or risk being shown the door.

Tobi Lutke, Shopify CEO. Photo by Union Eleven, CC BY-SA 3.0, via Wikimedia Commons.

Andrew Bosworth, Meta CTO

Our top engineers are spending their entire salary equivalent on tokens - and they're 5 to 10 times more productive for it.

Andrew Bosworth, Meta CTO. Photo by UK Home Office, CC BY 2.0, via Wikimedia Commons.

Read those three again. Notice that none of them is a claim about output. They’re all claims about spend.

The conflict of interest nobody mentioned out loud

Tokenmaxxing was sold as a productivity revolution. It was also, conveniently, a revenue engine for the people selling the picks and shovels.

Nvidia sells the chips that process the tokens. Providers price per token. So when a company stands up a leaderboard that rewards engineers for consuming more, it pumps money straight into the suppliers’ pockets. That creates a systemic pull toward products built for high volume, long contexts, and swarms of parallel agents - whether or not any of that moves a business metric.

When the people measuring the thing also profit from the thing going up, be suspicious of the thing going up.

Goodhart’s Law and the theater of output

When a measure becomes a target, it ceases to be a good measure.

That’s Goodhart’s Law, and tokenmaxxing is a textbook case. The moment token consumption was wired into performance reviews, people stopped optimizing for good work and started optimizing for the number.

What followed was pure theater:

  • Meaningless agents. At Amazon, employees reportedly ran agents on redundant or useless tasks just to keep their usage high enough to survive review season.
  • Gaming the dashboard. Meta developers watched colleagues’ usage dashboards to make sure their own number stayed “safely above average.”
  • Budget-burning side projects. Engineers elsewhere spent company AI budgets on personal projects they never meant to ship, purely to drain their allocation before it reset.

We’ve seen this movie before. Commit farming. Lines-of-code targets. Story-point inflation. Every time, the same lesson: measure activity instead of outcome and people will hand you activity, beautifully optimized and completely hollow.

The economic reckoning

The financial hangover hit fast.

Uber’s CTO revealed the company blew through its entire annual AI budget by April. Microsoft reportedly cancelled Claude Code subscriptions across divisions after costs ran away. And the most damning part: some companies are said to be laying off staff to pay their AI bills. The compute didn’t replace the work - it replaced the headcount that used to do the work.

The broader data doesn’t rescue the story either:

  • Deloitte found nearly half of finance leaders can’t point to clear, measurable value from their AI spend.
  • 70% of organizations claim positive ROI, but fewer than 1% report returns of 20% or more.
  • Most of the wins are “soft” productivity gains in the 1-5% range - exactly the kind that are impossible to pin down and easy to imagine.

A revolution that can’t find its own ROI isn’t a revolution. It’s a line item.

What efficiency actually looks like

The correction is already underway, and it has a name: token efficiency. Output per token, not tokens burned.

This isn’t just a vibe shift. Research keeps showing that piling on more tokens hits sharply diminishing returns past a threshold - the relationship between volume and quality was never linear. Past a point, you’re paying more to get a worse answer that took longer.

So the metrics that survive will be the boring, concrete ones:

  • Review cycles shrinking (four days down to one).
  • Time-to-ship dropping.
  • Choosing cost-efficient, economically useful models over the most expensive option on the menu.

None of those have a leaderboard. That’s the point.

The takeaway

Tokenmaxxing was never about productivity. It was a number that was easy to grow, easy to game, and profitable for the people who set it. The companies that come out ahead won’t be the ones that burned the most - they’ll be the ones that made the technology answer for specific, measurable outcomes.

Hold your tools accountable for results, not for receipts. The era of the gamed leaderboard is over. The era of accountability is just getting started.