The hidden economics of the AI revolution
Notes from OpenAI on measuring AI's economic impact beyond GDP
Another week, another event with the OpenAI research team...
I recently took part in an “AI Economics” forum hosted by the OpenAI research team. We heard perspectives from economists, policy experts, and technologists all centered around the core question: How do we harness AI for broad-based economic growth, and how can we better measure that progress?
My main question going into the event is around how to integrate AI-centric “centaur” workforces. As with all great events, I left with many more questions and places to explore and dig deeper.
Intro session (Ronnie Chatterji)
The forum kicked off with a session led by Ronnie Chatterji, OpenAI’s new Chief Economist. He’s been at OpenAI for just three months, and before that he helped implement the CHIPS and Science Act in the Biden administration. Chatterji’s research background focuses on how organizations adopt technology and how organizations are reconfigured to support the introduction of said technology.
Divergent GDP estimates: Ronnie brought up studies highlighting the enormous range of estimates for AI’s impact on GDP: anywhere from Acemoglu’s modest 0.66% projection to Korinek and Suh’s 7–18%. Part of the variance hinges on how much AI will seed entirely new industries versus merely optimizing existing ones. The central question about AI: "Is it just a search engine or something that fundamentally creates new sectors (the way the internet once did)?" He connected this to historical parallels: the steam engine, electricity, and the integrated circuit all took decades before fully rewriting the economy.
Geographic differences and the importance of organization: Another crucial point was that adopting the same technology in different places can yield radically different results. An example is the 1990s IT productivity boom in the U.S. versus the relatively slower uptake in the U.K. Chatterji hinted that similar divergences might appear with AI: some regions or countries might wholeheartedly embrace AI-driven reorganization, whereas others might bolt it onto existing infrastructure and see weaker returns.
Why Chatterji fits the role: Ronnie has an incredible amount of energy and enthusiasm, and a very impressive ability to present. I noticed how smoothly he navigated macroeconomic themes, historical examples, and the cross-section of policy and technology. It was clear why OpenAI hired him as their chief economist. He was energetic and comfortable fielding audience questions and weaving a very interesting narrative to start the afternoon's discussions.
This time is different (Noam Brown)
“A lot of criticisms of LLMs were valid six months ago, but that’s no longer the case.” — Noam Brown
Why is this AI shift different? Noam Brown (famous for advancing AI in poker and for building an AI that competes in Diplomacy) explained why AI seems so different now compared to earlier waves. Past AI milestones like IBM’s Deep Blue or Watson were huge but narrow — a chess engine can’t just pivot to diagnosing diseases. By contrast, large language models can generalize across many different tasks, from code generation to summarizing complex research.
Two scaling paradigms: Brown dissected two big approaches in AI right now. The first is the “pre-training paradigm,” which rides on ever-increasing compute and data, epitomized by the progression from GPT-2 to GPT-3 to GPT-4 — each jump requiring orders of magnitude more resources. The second is the “reasoning paradigm,” where the model is taught to reason more carefully by scaling up the time it spends during inference. Together, these paradigms can lead to AI systems that are more flexible and less reliant on raw scale.
Progress is sneakily fast: Noam emphasized that the speed at which AI evolves can make once-valid critiques quickly outdated. Indeed, the leap from GPT-3 to GPT-4, with improved reasoning capabilities, left some AI skeptics scrambling to reevaluate. From my standpoint, it underscores the need for ongoing, iterative evaluations rather than a static set of assumptions about what AI can or can’t do. I think a lot of people are still mired in the circa 2023 understanding of language models, and in the AI space, that is eons ago.
OpenAI strategy and partnerships (Jason Kwon)
Jason Kwon, OpenAI’s Chief Strategy Officer, provided a broader lens on OpenAI’s approach to partnerships and policy. A former software engineer and product manager who later went to law school, Kwon now leads OpenAI strategy.
The Bitter Lesson: Kwon cited Rich Sutton’s “The Bitter Lesson,” which basically says that scale — more compute, more data — tends to trump domain-specific expertise. This parallels the “unreasonable effectiveness of big data,” an idea that brute-force data-driven approaches often outperform carefully handcrafted models.
Prestige bootstrapping: Kwon underscored that, for OpenAI, forging strategic partnerships with organizations (whether in academia or industry) is crucial to keep scaling and to push innovation. Part of Kwon’s role seems to be forging alliances that raise OpenAI’s profile. He spent the vast amount of time talking about partnerships with prestigious universities, corporations, and other institutions to showcase GPT-based solutions. It felt like these alliances aren’t just about research synergy; they’re also about building trust and visibility. After all, the debate around AI’s impact is often as much about public perception as it is about the underlying algorithms.
AI productivity and evals (Kevin Weil and Erik Brynjolfsson)
The J-Curve and the "productivity paradox": One of the forum’s highlights was the fireside chat between Kevin Weil (OpenAI’s Chief Product Officer) and Erik Brynjolfsson (Stanford professor and longtime observer of the digital economy). The question on everyone’s mind: Where is the measurable economic impact of AI so far?
Brynjolfsson pointed out that new technologies frequently follow a J-curve: you spend heavily on reskilling and organizational transformations up front, and the payoff appears later. Erik cited the classic example during the Industrial Revolution of how factories initially replaced a single steam engine with a single electric motor — yielding minimal gains. Real productivity improvements arrived only when managers reconfigured factory floors around multiple electric motors driving individual machines. Weil added that in the AI world, some companies are similarly “stuck” in stage one, while others are already rethinking processes and reaping bigger benefits.
Free AI and GDP: Another point Brynjolfsson made was that many new AI products are either free or priced cheaply. This can mask economic impact if you look only at GDP. People get the technology at minimal cost, so the “consumer surplus” is huge, yet it doesn’t show up as a jump in measured GDP. Weil framed it as a potential mismatch: we might see significant real-world benefits without a corresponding spike in official statistics.
The impact of intelligence benchmarks: Erik raised a provocative point about how we measure “intelligence” in AI systems, arguing that today’s benchmarks too often assume a human-centric view. In other words, we’re testing AI primarily on tasks that human beings do well — standardized tests, language comprehension, or logic puzzles — rather than exploring forms of intelligence that might be uniquely machine-driven or altogether different from our own. This creates a risk that AI development becomes focused on substituting human intelligence instead of complementing it.
Economically speaking, if AI is viewed purely as a substitute, it can drive down wages or marginalize workers in the tasks where humans currently excel. But if our benchmarks recognize and reward AI’s ability to enhance human efforts—say, by measuring how well a model collaborates with a human team — then AI can become a powerful ally rather than a direct competitor. Erik’s call, then, is to broaden the scope of our evaluations so we encourage AI developers to build systems that fill in the gaps and open up new possibilities, rather than merely matching (and undercutting) human skills.
How to measure AI economic value (Tom Cunningham)
The standardized test trap: Tom Cunningham, an economist and data scientist at OpenAI, dug into the nuanced issue of how we measure the value of AI. A human that scores in the 90th percentile on the LSAT would likely garner a very high-paying job. An AI model can also score in the 90th percentile on the LSAT, but it would not be able to get any job in the real world.
Demand curves and willingness to pay: Cunningham emphasized techniques like measuring user “willingness to pay” (WTP) to gauge the real economic value of these tools. LLMs may have generated around $10 billion in revenue in 2024, just 0.01% of world GDP. But that figure is only a sliver of the total surplus generated if the actual benefits to individuals or businesses go far beyond what they’re paying. In other words, a lot of AI’s value isn’t fully captured by revenue alone.
Augmentation vs. automation: Another angle was the difference between augmentation and automation. Randomized-control studies where some workers have LLM access ("augmentation") and some don’t show productivity gains ranging from 5% to 50%. Meanwhile, “automation studies” check whether an AI can handle entire tasks (like coding projects) independently. One experiment, “SWE-lancer,” introduced an AI bidding for coding projects. The best models earned $400,000 out of a possible $1 million — a very strong start, but not quite at the level of top human freelancers yet.
Challenges ahead: Of course, one critical factor is that many jobs don’t have straightforward productivity metrics. Compensation often isn’t tied directly to output. So, even if an AI-augmented individual is far more efficient, the broader economic ripple might not be immediate or might be invisible to standard measurement tools. This leads to tough policy questions about how to support or re-skill workers as tasks increasingly shift to AI or AI-assisted pipelines.