Safety · Apr 19, 2026

Anthropic restricts access to vulnerability-finding AI, raising questions about disclosure and governance

Schneier examines the tradeoffs in Anthropic's decision to limit Claude Mythos to 50 organizations, arguing that responsible disclosure requires more transparency and broader research access.

Trust55

HypeSome hype

5 sources · cross-referenced

ShareX LinkedIn Email

TL;DR

Anthropic announced Claude Mythos Preview, an AI model restricted to roughly 50 organizations under Project Glasswing, citing its dangerous capability to find and exploit software vulnerabilities across major operating systems and browsers.
The model reportedly discovered thousands of vulnerabilities across widely-used software, including bugs decades old, and could weaponize Firefox flaws into 181 usable attacks compared to its predecessor's two.
Schneier argues that Anthropic has provided only a 'highlight reel' of successes without disclosing critical information about false positive rates, limiting independent evaluation of the model's true capabilities.
The concentrated access to major tech vendors leaves gaps in coverage for specialized software domains—industrial controls, medical devices, regional banking—where attackers with domain expertise could potentially use Mythos as a force multiplier.
Schneier calls for mandatory transparency, globally coordinated auditing frameworks, and funded academic access rather than private-company-led governance of such powerful security tools.

Anthropic introduced Claude Mythos Preview last week, a vulnerability-detection AI model the company deemed too risky for public release. Instead of general availability, access was capped at approximately 50 organizations, including cloud providers and infrastructure vendors, through an initiative called Project Glasswing. The company released highlights of the model's capabilities: discovery of thousands of vulnerabilities spanning major operating systems and browsers, including flaws that had persisted for decades, and the ability to convert Firefox security gaps into 181 functional attacks—a significant increase from its predecessor model's two.

Schneier acknowledges that restricting access to a powerful security tool reflects responsible practice in principle. However, he highlights a critical gap: Anthropic disclosed that security contractors agreed with the model's severity ratings 198 times with 89 percent agreement, but did not publish data on false positive rates. Research on similar AI systems shows they often identify authentic bugs with high precision while simultaneously generating plausible-sounding false alarms. Without this unfiltered output data, the disclosed examples may not represent typical model behavior.

A second structural concern centers on software types. Mythos was trained primarily on widely distributed open-source projects, major browsers, and popular frameworks—precisely the systems used by the 50 organizations granted access. This creates a natural advantage for early patching in mainstream software. Conversely, software outside the training distribution—industrial control systems, medical device firmware, specialized financial infrastructure, older embedded systems—represents the domain where Mythos is theoretically least capable in raw form. Yet an adversary with specialized knowledge in these fields could use the model's reasoning capabilities as a multiplier to probe systems that Anthropic's own engineers lack expertise to audit.

Schneier argues that meaningful risk reduction would require structured access for academic researchers, domain specialists, and civil-society organizations—cardiologists partnering with medical device security teams, control-systems engineers, and researchers focused on less mainstream software ecosystems. Fifty companies, however carefully selected, cannot replicate the distributed expertise that would enable comprehensive independent evaluation and defense across the full spectrum of critical infrastructure. He notes that OpenAI has announced similar restrictions on its GPT-5.4-Cyber model, and that smaller, cheaper public AI models have partially replicated some of Anthropic's published findings, raising questions about the true novelty of these releases.

Beyond capability assessment, Schneier frames the governance question as one of democratic legitimacy. A private company, even one acting in good faith, should not unilaterally decide which pieces of global critical infrastructure receive security resources first. He calls for mandatory transparency—aggregate performance metrics, independent auditing frameworks, and funded access for researchers outside the vendor circle. Absent such frameworks, each restricted release of a Mythos-class model places society on the edge of an unobservable precipice regarding unknown vulnerabilities, he writes, without meaningful public input into decisions that affect collective security.

Sources

Also on Safety

Anthropic restricts access to vulnerability-finding AI, raising questions about disclosure and governance

FBI Extracted Deleted Signal Messages from iPhone Notification Database

Delve's security certifications failed to prevent breaches at multiple customers

AI is lowering barriers for cybercriminals while defenses race to catch up