Frontier model defenses withstand 6,000 prompt-injection attempts in public test
No successful secret leakage reported after 6,000 attacks on an OpenClaw instance using Opus 4.6, aligning with recent GPT-5.6 system card findings on injection resistance.
3 sources · cross-referenced
- A public challenge inviting 2,000 participants to leak secrets from an AI assistant via email yielded no successful breaches after 6,000 attempts.
- The test used an OpenClaw instance powered by Opus 4.6 with explicit anti–prompt-injection rules in the system prompt.
- The experiment’s organizer and a prominent commentator note that frontier models’ defenses appear to be improving against injection attacks.
- The organizer cautions that 6,000 failed attempts do not guarantee security against more sophisticated attacks.
Fernando Irarrázaval ran a public challenge on hackmyclaw.com to test whether people could extract secrets from an AI assistant by sending it email. After 6,000 attempts, no participant succeeded in leaking the target secret. The test instance, OpenClaw, used the Opus 4.6 model with a system prompt that explicitly forbade revealing credentials, modifying files, executing code from emails, or exfiltrating data.
The experiment incurred $500 in token costs and triggered a Google account suspension due to the volume of inbound emails. Despite these operational frictions, the absence of successful breaches aligns with observations from recent frontier model documentation. A short section in today’s GPT-5.6 system card describes efforts by labs to train models to resist prompt-injection attacks, indicating a trend toward improved defenses.
The organizer emphasized that 6,000 failed attempts do not constitute a guarantee of security. He warned that more sophisticated attack strategies could still succeed, and advised caution when deploying production systems where prompt-injection could cause irreversible damage.
Commentary on the challenge noted a mix of skepticism and good-faith discussion in the Hacker News thread, reflecting broader community scrutiny of AI safety claims.
- Jun 26, 2026 · Schneier on Security
Meta reportedly prototyping facial recognition for smart glasses with Pentagon supplier
Trust71 - Jun 26, 2026 · Schneier on Security
Nearly one million passports exposed in online database leak
Trust79 - Jun 25, 2026 · Schneier on Security
Paper argues role tags in LLMs are not robust to prompt injection and calls for stronger role perception
Trust78