Fable 5-safe-v2-final-final
No one knows how to talk about AI cyber risk
The First Rule of Fable Club
Good news, everyone! Fable 5 is back. They fixed it. It’s safe now. It wasn’t safe before but after a series of productive conversations with the US government in which Anthropic deployed their Chief Chud Officer—instead of cerebral Dario Amodei—it is safe now.
You might disagree, you might even have evidence to the contrary... But is it really worth losing frontier intelligence capabilities over?
I didn’t think so.
In a remarkable demonstration of principle and transparency, Anthropic is refusing to play hide-the-hotdog in their blog post about the re-release. Emphasis mine:
Our testing confirmed that many less capable models—including Claude Opus 4.8, GPT-5.5, and Kimi K2.7—could identify the same vulnerabilities as Fable 5 did in the report. When it came to the demonstration of how to exploit the single vulnerability, every model we tested could produce the same demonstration as Fable 5 (including Claude Haiku 4.5, Sonnet 4.6, Opus 4.6, Opus 4.7, Opus 4.8, GPT-5.4, GPT-5.5, and Kimi K2.7).
Just to run that back: Claude Haiku 4.5, Anthropic’s smallest and cheapest model which was first released in October of last year and currently ranks 68 on Arena.ai webdev rankings, was able to produce the same exploit demonstration as the one that catalyzed the Fable 5 export ban. Ok, Andy. If Haiku and Fable can both complete a task, that task is not the hallmark of dangerous super intelligence, it’s something else entirely.
There are various ways to interpret this. It seems very plausible to me, for example, that a true assessment of the cybersecurity risk posed by Fable 5 would find that it presents a dramatically lower risk than many open-source and open-weight models available to everyone right now.
One might ask, in that case, how the export directive decision was made. I do not necessarily think this is what the ideal regulatory regime for AI looks like, moving forward. But I am inclined to give the Administration some benefit of the doubt—they seem to have acted on time-sensitive information brought to them by a trusted third party, Amazon, who, if anything, could be seen to be biased in Anthropic’s favor. What’s more, the ban lasted less than three weeks, an instant in Government Time.
I have less sympathy for the Amazon security team involved in escalating the “jailbreak” to the White House. There may indeed have been ways of getting around Fable 5’s safeguards, overactive as they were on initial release. And this does seem to have prompted some broader collaboration on preventing jailbreaking specifically. But the specific example that the team brought to Anthropic does not suggest the level of familiarity with AI cyber capabilities that should be table stakes for a big tech security team in the post-Glasswing era.
Stuxnet Slop Cannons
I’m not convinced that concerns about Fable 5’s cybersecurity capabilities were well-founded. But the situation does raise one very good question: how should we assess the cyber risk posed by frontier models in the future?
And it’s a question that comes up in a number of important contexts. China Has Matched Anthropic in Cybersecurity, the Wall Street Journal printed on Saturday. Uhhh, No They Didn’t, responded longtime AI analyst Zvi Mowshowitz. Coughing baby vs hydrogen bomb.
It’s not hard to see where the headline claim falls apart. The Wall Street Journal reported uncritically the result of one narrow Insecure Direct Object Reference (IDOR) benchmark and a Chinese domestic security harness company’s claim about their own capabilities. There are other benchmarks that point to a different interpretation: one in which Mythos and GPT-5.5-Cyber are in a class of their own. Especially when it comes to multi-turn, autonomous operations that require many steps to complete.
“The Last Ones,” a cyber range put together by the UK’s AISI, is a 32-step corporate network attack simulation where the goal is full network takeover. On their best attempts, both the latest Mythos Preview checkpoint and GPT-5.5-Cyber were able to solve the range.
Benchmarks like these offer a clearer picture of frontier model cyber capabilities. But on their own they don’t tell us a lot about cyber risk posed by Fable 5, particularly at the ecosystem level. Why? For two reasons.
Firstly, Fable 5 is Mythos 5 (itself maybe less capable than the Preview model) in a gimp suit. Certain cybersecurity and bio-type researches couldn’t even say hello to Fable 5 without getting bounced to Opus. And that was before the latest Andy Jassy safeguard reinforcement. Thanks, Andy.
And these safeguards are likely to be more effective for the riskier capabilities that ranges like The Last Ones are intended to measure. You might be able to use careful prompting to jailbreak Fable 5 into answering one question that technically falls into an area the classifier is intended to pick up. It is much harder to avoid detection on a 32-step, multi-turn, 100m token scale autonomous network takeover.
And, as regular guest of the show Zack Korman has said before: this is where there is the most daylight between Mythos and other models. In a national security context, this seems actually pretty relevant. Mythos poses a much greater risk than GLM 5.2 in the hands of an amateur, but in the hands of a US state-level adversary, who might employ hundreds of offensive cyber experts, the marginal advantage is likely to be much smaller.
The GSV What Are the Civilian Applications?
But there is a second reason why cyber capabilities are not the same as cyber risk. The other side of advanced cyber-offensive capabilities is the ability to identify and patch vulnerabilities before they can be exploited. This is the motivating force behind Project Glasswing at Anthropic, and Daybreak from OpenAI.
If Mythos was released tomorrow without any safeguards at all, instead of with new additional safeguards, the cyber risk that it would pose would be of significantly lower magnitude than if it had been released in February. And in a few months it will be lower still. The software ecosystem has adapted to new cyber capabilities in the past, even if sometimes people had to learn the hard way.
The cybersecurity community is no stranger to thinking about risks and benefits in this context. This conversation is, in many ways, a mirror image of conversations about Responsible Disclosure from cybersecurity’s ancient past.
One of the conclusions from that era was that non-disclosure has its own risks. Not least of which is the centralization of vulnerability in the system. In the past, that point of centralization was large software companies, on whose products a huge proportion of public digital infrastructure was dependent. Today, it’s increasingly looking like that point will be the frontier labs.
AI-Chobani Futurism
The resolution of the Fable 5 export ban could point in one of several directions. On one hand, it might portend a future in which American labs face an unreasonably high bar for frontier models: the Andy Jassy test. Generously, we might describe this as being concerned exclusively with capabilities at the expense of an ecosystem-wide assessment of risk.
More optimistically, it could suggest a world in which the Government moves fast to address concerns about potential risks and equally quickly when those concerns have been ameliorated. Even better would be if frontier labs actively monitored the cyber ecosystem and adjusted the public frontier in response. Perhaps Mythos 5 is Glasswing-only initially, then, as key software is hardened, available to all security teams, and, eventually, fully public with lightweight safeguards.
Cyber capabilities are only going to keep getting better. This goes as much for open-source and open-weight models as it does for leading American labs. If we continue to evaluate them and discuss their significance in the terms that were used in the early Mythos announcements, I suspect we will see more Fable 5 moments. And, ironically, this may in turn increase the risk of bad cyber outcomes.
As always, the only way out is through.













hide the hotdog is a banger