Skip to content
Agents · Jun 23, 2026

OpenAI board member and Gray Swan cofounders discuss why AI security requires a new approach as agent risks rise

Zico Kolter and Matt Fredrikson argue that AI security is not just 'cybersecurity with AI' and that agent-native vulnerabilities demand specialized defenses.

Trust72
HypeSome hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • OpenAI board member Zico Kolter and Gray Swan CEO Matt Fredrikson discussed why AI security requires a distinct approach from traditional cybersecurity.
  • They highlighted prompt injection and indirect prompt injection as critical vulnerabilities, especially for agentic systems like Codex and Claude Code.
  • Gray Swan’s Shade tool and Arena are positioned as leading automated red-teaming solutions for evaluating model robustness.
  • The conversation emphasized that scaling frontier models does not inherently improve robustness against adversarial attacks.

OpenAI board member Zico Kolter and Gray Swan CEO Matt Fredrikson appeared on the Latent Space podcast to argue that AI security cannot be reduced to traditional cybersecurity principles. They emphasized that AI systems introduce vulnerabilities unique to their architecture and use patterns, particularly as agentic workflows become more common. Kolter and Fredrikson co-authored a definitive paper on indirect prompt injections, a class of attacks that has gained prominence following U.S. export controls on models like Mythos and Fable.

They described prompt injection as a new exploit class for agentic systems such as Codex and Claude Code, where untrusted inputs can manipulate model behavior in ways that traditional software security models do not anticipate. Fredrikson noted that AI systems fail differently from human-written code, requiring a security mindset distinct from conventional practices. Kolter added that the rise of widely adopted agent platforms could lead to correlated failures if vulnerabilities are discovered in shared infrastructure.

Gray Swan’s role in this space was highlighted through their automated red-teaming tool, Shade, which the company claims can outperform humans at breaking AI systems. The tool is part of a broader ecosystem that includes Cygnal, an AI guardrails product, and the Gray Swan Arena, described as the world’s largest AI red-teaming platform. The Arena features community-driven testing and celebrity participants, such as Wyatt Walsh, to evaluate model robustness against adversarial attacks.

The discussion also addressed the limitations of scaling frontier models as a solution to security challenges. Kolter and Fredrikson argued that larger models do not automatically become more robust, and may even introduce new failure modes. They pointed to the concept of 'gray swan' events—risks that are predictable in hindsight but often overlooked until they materialize—as a framework for understanding the current threat landscape. The conversation concluded with a focus on the emerging intersection of AI security, compliance, and insurance, suggesting that future AI deployments will require integrated safeguards and accountability mechanisms.

The podcast episode situates these technical discussions within broader concerns about enterprise AI adoption. Fredrikson described a 'lethal trifecta' of risks—untrusted data, private data exposure, and exfiltration—as central challenges for organizations deploying AI agents. Kolter emphasized the need for 'agent-native identity and permissions' to manage access and mitigate misuse in enterprise environments. They also referenced OpenClaw, a framework for evaluating computer-use agents, as an example of the evolving tooling required to secure next-generation AI systems.

Sources
  1. 01Latent Space — swyxRed-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan
Also on Agents

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.