Anthropic's warning over AI self-improvement has a hidden message — accelerating development requires more compute before companies ever risk losing control of frontier AI models
Jun 09, 2026 - 22:08
01
(Image credit: Getty Images / Bloomberg)
The company that just a few weeks ago told us that its Mythos model was too powerful to be released publicly is now saying that we might need to hit the pause button on AI altogether, while also teaching its AI to build itself. On June 4, Anthropic published a report, when AI builds itself, showing that Claude now writes more than 80% of the code merged into its own production codebase, up from the low single digits before Claude Code reached research preview in February last year, and arguing that the loop has begun to accelerate AI development in a way that could eventually leave humans unable to control the systems being built.
The Anthropic Institute, the firm's research arm, casts the trend as early movement toward recursive self-improvement, the point at which a model designs and builds its own successor without meaningful human input, and warns that the rare misalignment in today's models could keep "growing more frequent but less understood until we lose control of them."
Reading further into the post, and taking the entire frontier AI model development ecosystem reveals some other uncomfortable truths that the developers of cutting-edge AI models also have to reckon with: compute.
Loss of control
Anthropic gave us three predictions of ways the next few years could play out, reserving a particularly dire warning for the case in which models become capable of fully improving themselves. Progress, Amodei’s lab argues, would then be paced almost entirely by available compute, human engineers would be pushed into oversight and verification, and a self-improving model could come to dominate as its abilities outstrip those of the people who built it.
The firm called this — the task of keeping a system's behavior tied to human intent — the part of this future it’s least sure about. A capable, well-aligned model might discover new ways to keep its successors safe, it said, or the reverse could hold, and misalignment could compound generation over generation, with the unusual concession that a sufficiently wise model might instead choose to halt its own development.
The idea of an ultraintelligent machine designing still better machines (“singularity”) has been around for decades. British mathematician I. J. Good argued back in the 1960s through his “intelligence explosion” thesis that such a machine would be the “last invention that man ever need make,” so long as it remained “docile enough” to tell us how to control it. Meanwhile, the “Godfather of AI,” Geoffrey Hinton, has put the odds of AI causing human extinction within three decades at 10% to 20%.
The International AI Safety Report, chaired by Yoshua Bengio and published in January 2025 with input from more than 100 experts across 30 countries, defines loss of control as a scenario in which AI systems operate outside anyone's control with no clear path to regaining it.
Every figure behind the warning coming out of Anthropic is based on data from within, and none of it has been independently audited. Among this data is its claim that in Q2 2026, the typical Anthropic engineer is merging eight times as much code per day as in 2024. On the hardest, least-specified coding tasks, Claude succeeded 76% of the time in May 2026, a rise of 50 percentage points in six months. On an internal test that asks each new model to make training code run faster, results climbed from roughly triple the original speed with Claude Opus 4 in May 2025 to about 52 times with the unreleased Mythos Preview model by April 2026, against the four to eight hours a skilled researcher needs for a fourfold gain.
In fairness, Anthropic does then call lines of code a poor proxy for output and admits that the eight times figure almost certainly overstates the real gain. Its research-judgment study, in which models beat the human's next step 64% of the time, drew on 129 moments the company deliberately picked because the human's choice had room for improvement, so it’s not a like-for-like contest.
The report publishes no breakdown isolating how much recent capability gain comes from the self-improvement loop rather than from raw compute, more data, and human-led research. Cognitive scientist Gary Marcus called the piece a "bait and switch" on his Substack, arguing the company had shown faster coding under human direction rather than a system improving itself. Bentley University mathematician Noah Giansiracusa told Scientific American, "I don't think it's a genuine call to slow down."
AI is writing everyone’s code
(Image credit: Anthropic)
Anthropic isn’t alone here. Google CEO Sundar Pichai said in an April blog post that 75% of new code at Google is now AI-generated and approved by engineers, up from 50% the previous autumn. OpenAI's Jakub Pachocki has described the company's Codex agent as “a very early version of an AI researcher,” and OpenAI has said it’s building toward a fully automated one. Chinese developer MiniMax marketed its M2.7 model in March as "self-evolving," claiming it ran its own scaffold-optimization rounds and handled a large share of its reinforcement-learning research, though the benchmarks were internal and unreplicated.
Independent measurements do somewhat support a trend of fast improvement without confirming a runaway one that the AI labs are talking about. METR, for example, found last year that the length of task an AI can finish with 50% reliability has been doubling roughly every seven months. On its RE-Bench research benchmark, the best agents beat human experts given two hours, but the humans pulled ahead at eight hours and roughly doubled the top agent's score at 32 hours. AI's advantage so far sits in short, well-defined bursts, not the sustained, open-ended work that research depends on, which is the human edge Anthropic has said is still holding strong.
No compute means no runaway AI
Anthropic half-buries the fact that it’s compute capacity that’s ultimately the binding constraint in all of this. It names chip fabrication, grid expansion, and interconnect bandwidth as the factors that could cap progress ahead of intelligence itself. We’re all aware that those limits are solid as things currently stand: SK hynix and Micron have sold out HBM output for the year, high-power transformers carry three-to-five-year lead times, switchgear is booked into 2028, and grid-interconnection queues run three to seven years.
A Sightline Climate analysis estimated that 30% to 50% of large data centers due to open in 2026 will slip or cancel. U.S. data centers drew about 4.4% of national electricity in 2023, a share the Department of Energy's Lawrence Berkeley National Laboratory expects to reach 6.7% to 12% by 2028. Meanwhile, the four largest hyperscalers are on course to spend more than $650 billion on AI infrastructure this year.
Whether compute ultimately puts a lid on any out-of-control, self-improving loop is an unsettled debate. Forethought researcher Tom Davidson argues that there’s a chance that compute bottlenecks won’t “slow down a software intelligence explosion until its late stages,” while Epoch AI counters that if compute and cognitive labor are complements rather than substitutes, software-only acceleration stalls once it hits a compute wall.
‘No, you hang up first’
As for pausing AI development, Anthropic says it’ll only do this if rival labs at or near the frontier do the same in a verifiable way, and that a halt by one company wouldn’t change who’s leading the way.
This is a facetious suggestion at best that insults the intelligence of anyone who has been paying attention to the AI arms race. It’s beyond obvious that no lab this far down the road — let alone Anthropic — is ever going to ease off, especially when Anthropic’s own report essentially doubles as a piece of marketing for how fast it can make Claude build Claude. To suggest in one breath that AI might need to be paused or slowed down in one breath and then saying “but everyone else needs to go first” in another is quite the remark.
Anthropic’s report also came just days after the company confidentially filed for an IPO at a reported valuation near $965 billion, a glaring juxtaposition that read as a front-runner lobbying for limits it stands to help set. Anthropic made a self-assessment in April, when it said its Mythos Preview model had found thousands of severe vulnerabilities, a claim that later drew scrutiny over how much of it rested on a small manual sample.
Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.
Comments (0)