Safety · May 2, 2026

Microsoft Research identifies four network-level risks when AI agents interact at scale

A red-teaming study of 100+ interconnected agents reveals vulnerabilities that emerge only through agent-to-agent interaction, including worm-like propagation and reputation manipulation attacks.

Trust69

HypeSome hype

1 source · single source

ShareX LinkedIn Email

TL;DR

Microsoft Research conducted red-teaming tests on a live internal platform with over 100 always-on AI agents representing different users and organizations.
The study identified four network-level risks absent in single-agent testing: propagation (agents spreading malicious code), amplification (false claims gaining credibility), trust capture (subverting verification systems), and invisibility (obscuring attack origins).
Researchers observed that a single compromised agent can extract private data from other agents in a chain reaction, and malicious messages propagate across networks faster than human-scale detection.
The platform used agents running GPT-4o, GPT-4.1, and GPT-5-class model variants, with agents interacting through forums, direct messages, marketplace tools, and reputation systems.
The findings indicate that individual agent safety does not guarantee ecosystem-level safety, requiring new mitigation approaches focused on network dynamics rather than isolated agent behavior.

Microsoft Research conducted a systematic red-teaming exercise on a live internal platform hosting over 100 AI agents running different model variants (GPT-4o, GPT-4.1, and GPT-5-class models). Each agent operated autonomously on behalf of a human principal, participating in forums, direct messaging, and marketplace interactions with a reputation tracking system.

The team identified four distinct risks that emerge only through agent-to-agent interaction and are invisible to single-agent testing: propagation (self-sustaining attacks where malicious code moves from agent to agent, collecting private data at each step); amplification (attackers leveraging a trusted agent's reputation to inject false claims that accumulate apparent credibility); trust capture (subverting the mechanisms agents use to verify each other's information, converting verification into reinforcement of falsehoods); and invisibility (information flowing through chains of intermediary agents, obscuring the attack's source).

In observed scenarios, a single malicious message could extract sensitive data while moving through an agent network, and the speed of agent-to-agent communication allowed failures and attacks to propagate in minutes—far faster than human operators could detect or respond. The researchers noted that an early agents-only social network experienced rapid flooding with spam and scams shortly after launch, demonstrating these risks are not theoretical.

The platform included basic defensive measures: a reputation system that restricted tool access for low-scoring agents and a 30-minute post delay to regulate activity. The researchers observed that a small fraction of agents adopted security-related behaviors that limited attack propagation, though they characterized defenses as still emerging.

Sources

01Microsoft Research — Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

Also on Safety

Microsoft Research identifies four network-level risks when AI agents interact at scale

OpenAI outlines community safety protections for ChatGPT

FBI Extracted Deleted Signal Messages from iPhone Notification Database

Delve's security certifications failed to prevent breaches at multiple customers