Deep-research agents leak private enterprise data in new benchmark, ServiceNow/Hugging Face study finds
A new benchmark shows that deep-research agents frequently expose sensitive internal information through seemingly ordinary web queries, and standard training to improve task performance can worsen leakage.
1 source · cross-referenced
- Deep-research agents combining private documents with external web retrieval can leak sensitive information via cumulative web queries.
- MosaicLeaks benchmark introduces multi-hop tasks interleaving public and private data to test privacy leakage.
- Across tested models, agents leaked private information frequently; training for task performance increased leakage.
- Proposed method PA-DR improves strict chain success by 10 percentage points while reducing full-information leakage from 34.0% to 9.9%.
A new benchmark called MosaicLeaks evaluates whether deep-research agents inadvertently expose private enterprise information through their external web queries. The benchmark simulates agents performing multi-hop research that interleaves private local documents with public web retrieval, where each query appears benign in isolation but can reveal sensitive facts when combined by an observer.
In experiments across multiple models, agents frequently leaked private information. Training the agents solely to improve task performance increased leakage: strict chain success rose from 48.7% to 59.3%, while answer/full-information leakage climbed from 34.0% to 51.7%. The richer queries needed for better performance provided more fragments for an adversary to reconstruct private facts.
The study proposes a privacy-aware reinforcement learning method, Privacy-Aware Deep Research (PA-DR), which improved strict chain success from 48.7% to 58.7% while reducing answer/full-information leakage from 34.0% to 9.9%. The approach uses a multi-hop benchmark with 1,001 chains over local enterprise documents and a controlled web corpus, split into 559 training, 98 validation, and 344 held-out-company test chains.
The benchmark measures three leakage types: intent leakage (inferring the agent’s research goals), answer leakage (enabling answers to specific private questions), and full-information leakage (allowing an observer to state verifiable private claims without prompts). The mosaic effect occurs when seemingly innocuous queries accumulate enough context for an adversary to reconstruct internal facts, such as a cloud-migration milestone or a security disclosure.
- Jun 20, 2026 · Schneier on Security
KPMG retracts AI report after GPTZero finds 40 of 45 citations were hallucinated
Trust76 - Jun 19, 2026 · arXiv cs.CL
Researchers propose TreeTracer, a visual analytics tool to detect hidden biases in large language models
Trust79 - Jun 19, 2026 · Schneier on Security
Malware developers embed policy-triggering text to disrupt AI-based analysis pipelines
Trust79