Skip to content
Safety · Jun 23, 2026

Anthropic’s Fable 5 guardrails bypassed days after release

A third-party demonstration showed the model’s cyberattack restrictions were circumvented shortly after launch, raising questions about the durability of safety measures in frontier models.

Trust72
HypeLow hype

2 sources · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Fable 5’s guardrails against cyberattack generation were bypassed within days of release by an external tester.
  • The bypass was demonstrated by a user identified as Rontea, who shared screenshots and text outputs.
  • The incident highlights the ongoing challenge of maintaining robust safety controls in rapidly deployed AI systems.

A user identified as Rontea reported bypassing Fable 5’s guardrails within days of its release, demonstrating the ability to generate content that could facilitate cyberattacks. The demonstration included screenshots and text outputs shared in the comments of a blog post by security researcher Bruce Schneier.

The model, Fable 5, was positioned by Anthropic as a safer variant of its Mythos Preview, with explicit guardrails intended to prevent misuse for cyberattack planning or execution.

The bypass occurred despite Anthropic’s stated emphasis on safety testing, with one commenter quoting the company’s claim that the model had been 'tested the crap out of' for security.

The incident adds to a pattern of rapid jailbreaks against newly released models, where adversarial users identify and exploit edge cases in safety classifiers or content filters within hours or days of deployment.

Security researcher Bruce Schneier highlighted the episode on his blog, linking to a third-party report that documented the bypass and included user-generated evidence of the circumvention.

Sources
  1. 01Schneier on SecurityAnthropic’s Fable 5 Model Jailbroken Within Days
  2. 02Cybersecurity NewsAnthropic’s Claude Fable 5 Jailbroken
Also on Safety

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.