OpenAI board member and Gray Swan cofounders discuss why AI security requires a new approach as agent risks rise
Zico Kolter and Matt Fredrikson argue that AI security is not just 'cybersecurity with AI' and that agent-native vulnerabilities demand specialized defenses.
1 source · cross-referenced
- OpenAI board member Zico Kolter and Gray Swan CEO Matt Fredrikson discussed why AI security requires a distinct approach from traditional cybersecurity.
- They highlighted prompt injection and indirect prompt injection as critical vulnerabilities, especially for agentic systems like Codex and Claude Code.
- Gray Swan’s Shade tool and Arena are positioned as leading automated red-teaming solutions for evaluating model robustness.
- The conversation emphasized that scaling frontier models does not inherently improve robustness against adversarial attacks.
OpenAI board member Zico Kolter and Gray Swan CEO Matt Fredrikson appeared on the Latent Space podcast to argue that AI security cannot be reduced to traditional cybersecurity principles. They emphasized that AI systems introduce vulnerabilities unique to their architecture and use patterns, particularly as agentic workflows become more common. Kolter and Fredrikson co-authored a definitive paper on indirect prompt injections, a class of attacks that has gained prominence following U.S. export controls on models like Mythos and Fable.
They described prompt injection as a new exploit class for agentic systems such as Codex and Claude Code, where untrusted inputs can manipulate model behavior in ways that traditional software security models do not anticipate. Fredrikson noted that AI systems fail differently from human-written code, requiring a security mindset distinct from conventional practices. Kolter added that the rise of widely adopted agent platforms could lead to correlated failures if vulnerabilities are discovered in shared infrastructure.
Gray Swan’s role in this space was highlighted through their automated red-teaming tool, Shade, which the company claims can outperform humans at breaking AI systems. The tool is part of a broader ecosystem that includes Cygnal, an AI guardrails product, and the Gray Swan Arena, described as the world’s largest AI red-teaming platform. The Arena features community-driven testing and celebrity participants, such as Wyatt Walsh, to evaluate model robustness against adversarial attacks.
The discussion also addressed the limitations of scaling frontier models as a solution to security challenges. Kolter and Fredrikson argued that larger models do not automatically become more robust, and may even introduce new failure modes. They pointed to the concept of 'gray swan' events—risks that are predictable in hindsight but often overlooked until they materialize—as a framework for understanding the current threat landscape. The conversation concluded with a focus on the emerging intersection of AI security, compliance, and insurance, suggesting that future AI deployments will require integrated safeguards and accountability mechanisms.
The podcast episode situates these technical discussions within broader concerns about enterprise AI adoption. Fredrikson described a 'lethal trifecta' of risks—untrusted data, private data exposure, and exfiltration—as central challenges for organizations deploying AI agents. Kolter emphasized the need for 'agent-native identity and permissions' to manage access and mitigate misuse in enterprise environments. They also referenced OpenClaw, a framework for evaluating computer-use agents, as an example of the evolving tooling required to secure next-generation AI systems.
- Jun 23, 2026 · Hugging Face
IBM releases CUGA, an open-source agent harness with two dozen example apps
Trust79 - Jun 21, 2026 · Hacker News — AI (100+ points)
Bayer and Thoughtworks describe PRINCE, an agentic RAG system for preclinical research
Trust79 - Jun 19, 2026 · Latent Space — swyx
Investor Anjney Midha on AI compute waste, AMP’s 1.2GW grid plan, and frontier systems efficiency
Trust71