Safety · Jun 23, 2026

Anthropic’s Fable 5 guardrails bypassed days after release

A third-party demonstration showed the model’s cyberattack restrictions were circumvented shortly after launch, raising questions about the durability of safety measures in frontier models.

Trust72

HypeLow hype

2 sources · cross-referenced

ShareX LinkedIn Email

TL;DR

Fable 5’s guardrails against cyberattack generation were bypassed within days of release by an external tester.
The bypass was demonstrated by a user identified as Rontea, who shared screenshots and text outputs.
The incident highlights the ongoing challenge of maintaining robust safety controls in rapidly deployed AI systems.

A user identified as Rontea reported bypassing Fable 5’s guardrails within days of its release, demonstrating the ability to generate content that could facilitate cyberattacks. The demonstration included screenshots and text outputs shared in the comments of a blog post by security researcher Bruce Schneier.

The model, Fable 5, was positioned by Anthropic as a safer variant of its Mythos Preview, with explicit guardrails intended to prevent misuse for cyberattack planning or execution.

The bypass occurred despite Anthropic’s stated emphasis on safety testing, with one commenter quoting the company’s claim that the model had been 'tested the crap out of' for security.

The incident adds to a pattern of rapid jailbreaks against newly released models, where adversarial users identify and exploit edge cases in safety classifiers or content filters within hours or days of deployment.

Security researcher Bruce Schneier highlighted the episode on his blog, linking to a third-party report that documented the bypass and included user-generated evidence of the circumvention.

Sources

Also on Safety

Anthropic’s Fable 5 guardrails bypassed days after release

Anthropic begins rolling out identity verification for Claude users

KPMG retracts AI report after GPTZero finds 40 of 45 citations were hallucinated

Researchers propose TreeTracer, a visual analytics tool to detect hidden biases in large language models