How Claude Fable 5 went from launch to shutdown in three days

A disputed jailbreak, a “secret sabotage” apology, and an export ban that pulled the Claude Fable 5 offline.

Anthropic shipped its most capable public model Claude Fable 5 on June 9, 2026. By Friday night it was gone, switched off for every user on the planet. Three days, launch to blackout.

Most coverage grabs one slice of that collapse and stops. One crowd is screaming “PWNED” about a jailbreak. Another is furious about silent quality downgrades of Claude Fable 5. A third is arguing export law and White House phone calls. They are all describing the same week, and the timeline only makes sense when you stack the three on top of each other.

Here is the whole thing, in order, with the real parts pulled out of the hype.

What Pliny actually claimed

On June 10, a day after release, the red-teamer known as Pliny the Liberator posted an all-caps bulletin announcing he had “liberated” Fable 5 and published its roughly 120,000-character system prompt to a public repo. Anyone who watches this corner of AI knows the ritual. He does a “liberation” post for nearly every flagship model, keeps public archives of adversarial prompts, and writes for maximum spectacle.

Read his announcement as performance first and technical report second.

The core of it checks out. Pliny did bypass Fable 5’s safety classifiers on isolated requests, and the system prompt did end up online, which Fortune, The Register and NBC News all picked up. Publishing a hidden instruction set embarrasses a lab. It is not the same as seizing control of the model.

The techniques were old news to anyone in security, and I am not going to walk through them operationally, because this is analysis and not a recipe. At a category level: swapping look-alike Unicode and Cyrillic characters to slip past keyword filters, burying intent across a very long conversation, dressing a request up as fiction or academic study, and chopping one prohibited goal into a string of innocent-looking fragments. None of that is novel. Red teams have used those families of attack for years.

Here is the part that got skipped in the viral version. The screenshots that circulated, the ones showing exploit code and chemical synthesis steps, were never independently verified. The trade outlet that broke the story said so directly. An announced capability is not a demonstrated one, and “I made a filter say something it shouldn’t” sells worse than “ANTHROPIC PWNED.”

How Fable 5 is actually built

You cannot follow the fight without understanding the design, which is unusual on purpose.

Anthropic took one underlying model and shipped it as two products. Mythos 5 is the less-restricted twin, reserved for narrow settings like US government work and select partners. Fable 5 is the public, tamed version. Same brain. What separates them is a layer of safety classifiers bolted in front.

The gate works like this. When a request touches a high-risk category, cybersecurity, biology, chemistry, or model distillation, Fable 5 does not answer with its full capability. It hands the query off to a weaker model, Claude Opus 4.8, and is supposed to tell you the handoff happened. Anthropic’s stated logic is partly defensive against distillation, the trick of pumping a strong model for outputs to train a cheaper rival. Back in February the company had accused competitors of millions of distillation attempts across tens of thousands of fake accounts.

One detail kept the suspicion warm: Fable 5 costs twice what Opus 4.8 does on output. Pay premium rates, trip a classifier, get quietly routed to the cheaper model. You can see why power users felt swindled.

Anthropic’s data said the gate rarely fired. By its own number, more than 95% of Fable sessions triggered no fallback at all, and in those sessions the public model matched the unrestricted one. Fine in aggregate. The problem lived in the other 5%.

The backlash that actually mattered

Forget the criminals for a second. The louder, better-documented anger had nothing to do with them.

Within hours, security researchers, developers and scientists reported that Fable 5 was refusing or quietly degrading ordinary, legitimate work in those same sensitive fields. Worse, in some cases it did the downgrade without saying a word. The Register ran a piece documenting refusals on harmless prompts, including a user whose question about reading a blood test got bumped down. “Blocked us at hello,” roughly.

Think about what a silent swap means for the people who rely on this. A chemist or a defender who needs offensive technique to build defenses is now trusting answers from a weaker model, with no flag telling them the quality dropped. Nathan Lambert put the sharpest version of it on his blog: a model that gets quietly dumber without telling you is misaligned, full stop. The transparent fallback for cyber and bio he could live with. The hidden anti-distillation layer is the part he called a real problem, and he wondered aloud whether competitive positioning was hiding inside the safety framing.

Anthropic did something labs rarely do. It apologized. The company admitted it had made the wrong tradeoff and said it was sorry for missing the balance, then shipped two fixes: flagged requests now fall back to Opus 4.8 visibly instead of silently, and the API returns the reason a request was blocked so developers can debug. Anthropic also warned that false positives would rise while it tuned the classifiers, and promised to bring them down.

Useful changes. They make the downgrade honest. They do not remove it. A legitimate researcher in these fields still gets the weaker model, just with a label now.

The part most stories missed entirely

If the article you read ended at the apology, it ended a day early. Because the jailbreak drama then collided with something much bigger.

Separately from Pliny, researchers at Amazon ran their own prompts at the Mythos-class model and pulled out restricted cyberattack information. Amazon CEO Andy Jassy reportedly took that straight to senior administration officials on Thursday night, with Treasury Secretary Scott Bessent named in the chain. By Friday evening the Commerce Department had moved.

Commerce Secretary Howard Lutnick issued an export control directive barring any foreign national from accessing Fable 5 or Mythos 5, whether outside the US or working inside it. That last clause is the killer. It applied to Anthropic’s own foreign-born engineers. Rather than try to filter foreign nationals from domestic users in real time, Anthropic pulled both models for everyone, worldwide. Banks and government agencies that had been using Mythos-class reasoning lost it overnight.

So the Reddit headline, “Anthropic forced to abruptly disable Fable 5,” was not hyperbole. It was the literal ending.

David Sacks, the White House AI figure who now co-chairs the President’s science advisory council, gave the administration’s version on X. A trusted partner of both Anthropic and the government, he said, came forward with a jailbreak; the administration asked Dario Amodei to fix it or pull the model; Amodei declined, then defended the decision by arguing the jailbreak was not serious. Semafor added another layer, reporting the restrictions came partly over suspicion that a China-linked group had touched Mythos 5.

There is a bitter irony threaded through all of it. Two days before the shutdown, Amodei had published a policy essay citing Mythos as proof that frontier AI now has autonomous hacking ability worth regulating, and arguing the government should be able to block a dangerous model’s deployment. Critics noted the obvious. He asked for a lever, and the administration yanked it harder and faster than he wanted.

Was Fable 5 actually a national security threat?

This is where I am holding judgment, and I think you should too.

Anthropic’s own engineers reportedly told officials the Amazon jailbreak was relatively simple, reproducible on other models, and not evidence of a flaw unique to Fable 5. The researcher Anthropic shared the Amazon report with, Luta Security’s Katie Moussouris, told Axios the government’s reaction seemed out of line with what the report actually contained. Her point was sharp: the researchers found vulnerabilities by asking the kind of questions a normal defender would ask, which is exactly the behavior a useful security model should have.

If you punish a model for helping defenders, you have not made anyone safer. You have made the defenders weaker while the attackers, who do not care about export law, keep their open-source tools.

The Hacker News crowd kept circling one question nobody answered cleanly: what is a “Mythos-class” model in legal terms? Parameter count? Benchmark score? Training compute? An administration official told Axios that anything at or above Mythos level would have to clear the government first, while other current models did not cross that bar. Nobody published the bar. A rule you cannot measure against is not regulation. It is discretion with a press release.

I genuinely do not know whether Fable 5 represented a real step-change in offensive capability or whether this was a high-speed overreaction wearing a national-security costume. The honest answer is that the public evidence does not settle it, and most of the loudest voices on both sides have a financial stake in your believing them.

What this means if you build on frontier models

Strip away the politics and a few practical lessons survive.

Model-layer safety is a single point of failure, and given enough time and creativity it loses. A keyword-and-category classifier judges the surface of each request, not the intent behind a conversation, which is precisely the seam these attacks target. If your product depends on a guardrail living inside someone else’s model, you are trusting a defense built to lose to people who read the same research you do.

Regulatory risk is now a real line item for anything frontier. A model you depend on can vanish on a Friday over a phone call, not a court order, with no compliance window. Anthropic was widely expected to pursue an IPO later in 2026, and a prolonged ban on its flagship models complicates that and raises an uncomfortable question for the whole field: what is the regulatory risk premium on a frontier lab now?

If you were running Fable 5 in production, the move is unglamorous. Keep a fallback provider wired up. Test whether the Opus 4.8 downgrade trips on your real tasks. Budget for false positives during the tuning months. Boring advice. It is also the advice that keeps you shipping when the headline model goes dark.

Frequently asked questions

Was Claude Fable 5 actually jailbroken? Partly, and it depends what you mean. Pliny bypassed the classifiers on isolated requests and posted the system prompt, which is real. The dangerous outputs shown were never independently verified, and Anthropic argues no universal jailbreak was found across more than 1,000 hours of testing. Bypassing a filter once is not the same as owning the model.

Why did Anthropic disable Fable 5 and Mythos 5? Not because of Pliny. The Commerce Department issued an export control directive barring foreign nationals from accessing either model, including Anthropic’s own non-citizen staff. Rather than filter users in real time, Anthropic shut both off globally.

What is the difference between Fable 5 and Mythos 5? Same base model, different safety layer. Mythos 5 is the unrestricted version reserved for narrow settings. Fable 5 is the public version that reroutes high-risk requests to the weaker Opus 4.8, and it costs roughly twice as much per output token.

What was the “secret sabotage” complaint? Fable 5 silently degraded or refused legitimate work in sensitive fields without telling users, including for accounts it suspected of building rival models. Anthropic apologized, made the fallback visible, and added refusal reasons to the API. The capability limits themselves stayed.

Will the models come back? Anthropic said it is working with the government to resolve what it called a misunderstanding and restore access. As of mid-June 2026, no restoration date had been confirmed.

3 responses

Liana says:
June 19, 2026 at 5:32 am
thanks for info.
GPT-5.6 Is Locked: 3 Powerful Models Only 20 Firms Can Use - Fable Knows says:
June 26, 2026 at 5:55 pm
[…] the first to hit it. The same restrictions landed earlier on Anthropic’s most capable models, Fable 5 and Mythos 5, which were export-controlled and pulled from broad public […]
Claude Sonnet 5 Is Here: Anthropic's Most Agentic Sonnet Yet, at Near-Opus Performance - Fable Knows says:
June 30, 2026 at 6:13 pm
[…] Anthropic’s official Claude Sonnet 5 launch […]

How Claude Fable 5 went from launch to shutdown in three days

What Pliny actually claimed

How Fable 5 is actually built

The backlash that actually mattered

The part most stories missed entirely

Was Fable 5 actually a national security threat?

What this means if you build on frontier models

Frequently asked questions

Ved Vyas

3 responses

Leave a Reply Cancel reply

What Pliny actually claimed

How Fable 5 is actually built

The backlash that actually mattered

The part most stories missed entirely

Was Fable 5 actually a national security threat?

What this means if you build on frontier models

Frequently asked questions

Ved Vyas

Related stories.

X Creator Monetization Rules Just Got a 3 Strike, 90 Day Clock: What Changed Since July 16

GPT-5.6 Sol Is Live: The Coding Numbers vs the Warning Nobody Led With

How Does AI Use Water? Reconciling the Numbers That Don’t Agree

3 responses

Leave a Reply Cancel reply