6/4: Recursive Self-Improvement
Plus: Mythos could release next week, ChatGPT memory improvements, open letter on bioweapons, Meta delays Muse Spark API
It’s Thursday, and we are so close to hitting 100,000 followers on X. Be sure to give us a follow on X, watch us live on X and YouTube, and join our Discord to chat with our hosts live.
Today’s Experts
Mohammad Norouzi (Ideogram)
Olivia Scharfman (Institute for Progress) and Josh Wentzel (Foundation for American Innovation)
Matt Burtell (America First Policy Institute)
Patrick Boyle (American Wetware)
Zac Hill (Office of American Possibilities)
Anastasios Angelopoulos (Arena)
Marvin Von Hagen (Interaction Company of California)
Flo Crivello (Lindy)
Making Sense of the World
Anthropic on Recursive Self-Improvement
A month after co-founder Jack Clark’s blog post on recursive self-improvement, Anthropic has released their own report, based partially on internal data, arguing that AI systems that can fully autonomously design and develop their own successors are not very far away. If true, the implications for AI research narrowly and society more broadly are enormous.
They make a few different arguments:
First, if you just look at public data you see clear trends of consistent and rapid improvement. The famous METR time-horizons graph doubles every four months, and on research and engineering benchmarks like SWE-bench and CORE-bench, models have gone from near-zero to close to 100% in a year or two.
Claude writes 80% of new code at Anthropic now, and the code contributed per engineer per quarter has increased by 8x compared to the pre-2025 baseline1. While quantity of code is not the same as quality, this is still a major speedup.
Code written by Claude works most of the time, and is rapidly getting better. Claude Mythos Preview was a major improvement in Claude’s ability to do open-ended problems.
Claude is getting better at open-ended research tasks. In one experiment, Anthropic examined Claude Code transcripts of real research tasks where a researcher had made a mistake. They gave the transcripts before the mistake to Claude models and asked them where to go next. Mythos was able to choose a better action than the one chosen by the human researcher 64% of the time.
AI models are still clearly subhuman at research taste (deciding which problems to pursue in the first place), but Anthropic points out that this capability can improve just like others once seen as inaccessible to AIs, like explaining why a joke is funny, and even if it “only” automates most of AI research and engineering, humans can focus on the remaining fraction of tasks to become much more productive.
Putting all this together, Anthropic argues that continued AI acceleration is highly likely, and full recursive self-improvement is strikingly plausible. (Co-founder Jack Clark goes even farther, arguing that fully autonomous AI R&D with no human in the loop is >60% likely by the end of 2028, just two and a half years away).
So what should we do about it? Anthropic recommends building global institutions with the power to coordinate the labs to pause or slow down AI research for a period of time until societal institutions and alignment research have caught up. They even say outright: “If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing.” Demis Hassabis, CEO of Google DeepMind, is on record saying something similar.
There are two big trends of 2026 coming to a head here. One is the incredibly rapid improvement of frontier models, which in just a year have gone from slightly helpful with software engineering to able to complete hours-long tasks fully autonomously, solve open math problems, and show early signs of recursive self-improvement. If these signs continue, models will improve much faster. This would have absolutely monumental implications for practically every aspect of society2. I’m optimistic that the world will be radically better, but there’s no doubt that the world will be radically different.
The other trend is the slow but steady awakening of people and institutions to the importance of AGI. Today you see anti-datacenter protests in town halls and voluntary government model evals, tomorrow you might see the full bipartisan force of Congress, or the UN, come together to globally pause AI as firmly as they paused nuclear power. If models improve too fast, we risk misalignment, loss of control, and other catastrophic scenarios. But if institutions have time to react, they may very well enact sweeping restrictions on the technology that prevent us from curing cancer, bringing abundance to everyone in the world, and conquering the stars. We are walking across a tightrope, and over the coming years we’ll have to balance very delicately.
And more…
Demis, Dario, and Sama sign an open letter on bioweapons among many other figures in AI, biotech, and policy. For a while, you’ve been able to order synthetic DNA, RNA, and other nucleic acids online. This is extremely useful for biotech research, but can be misused to synthesize viruses, something made worse by the advent of bio-capable AI. The open letter calls for mandatory screening and record-keeping by those in the industry to prevent people from making bioweapons.
OpenAI releases a ChatGPT memory update, “dreaming”. It’s better at carrying forward useful context (so it remembers things after you tell it once), giving answers in line with your stated preferences and constraints, and being aware of the passage of time. Instead of adding to memory only when you tell it to, it automatically synthesizes memory throughout your chat history, and you can correct or delete memories listed in the model’s memory summary. We’re still a ways away from truly ambient memory that understands you like a close friend or significant other, but this is a step in the right direction.
Mythos could release next week. Allegedly, red teamers just got access to a checkpoint called Oceanus which exceeds the capabilities of Claude Mythos Preview that were reported in April. Once red teaming is complete, Oceanus will be released to the public as soon as next week.
TSMC CEO says supply can’t meet demand. AI demand continues to skyrocket, and supply continues to be constrained as chip and memory manufacturers hit physical limits. CEO C. C. Wei acknowledged that TSMC would not be able to meet AI demand for years, while committing not to abruptly raise prices.
Meta keeps delaying the Muse Spark API. Meta has sunk to a distant fifth place among the American AI labs despite massive infusions of capex and talent and a huge head start, having existed in some form since 2013. We’ll see if they can turn it around.
Ramp raises $750M at $44B. We had Ramp CPO Geoff Charles on MTS yesterday to talk about the launch of their new agentic accounting software, Stack.
Banger Review
There was practically no speedup on coding tasks throughout the Claude 1/2/3/3.5 era, until the release of Claude 3.7 Sonnet/Opus 4 and Claude Code in early 2025, which produced a ~1.5x speedup. This was increased to ~2.5x by the end of the year with Opus 4.5, then 8x internally at Anthropic with Mythos Preview.
If you’re on the AGI-pilled corner of the Internet, you probably read sentences like this a lot, but really, just stop for a second and consider how insane it is that we are just a handful of years away from real, actual, sci-fi superintelligence. Consider how insane it is that you already have a machine that can find you anything on the internet, write software for you, give you advice on your relationship, answer any question that you have in any field of human knowledge, or teach you how to do physics. I’ve pinched myself every day since November 30, 2022.























