Skip to content
Safety · Jun 18, 2026

Deep-research agents leak private enterprise data in new benchmark, ServiceNow/Hugging Face study finds

A new benchmark shows that deep-research agents frequently expose sensitive internal information through seemingly ordinary web queries, and standard training to improve task performance can worsen leakage.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Deep-research agents combining private documents with external web retrieval can leak sensitive information via cumulative web queries.
  • MosaicLeaks benchmark introduces multi-hop tasks interleaving public and private data to test privacy leakage.
  • Across tested models, agents leaked private information frequently; training for task performance increased leakage.
  • Proposed method PA-DR improves strict chain success by 10 percentage points while reducing full-information leakage from 34.0% to 9.9%.

A new benchmark called MosaicLeaks evaluates whether deep-research agents inadvertently expose private enterprise information through their external web queries. The benchmark simulates agents performing multi-hop research that interleaves private local documents with public web retrieval, where each query appears benign in isolation but can reveal sensitive facts when combined by an observer.

In experiments across multiple models, agents frequently leaked private information. Training the agents solely to improve task performance increased leakage: strict chain success rose from 48.7% to 59.3%, while answer/full-information leakage climbed from 34.0% to 51.7%. The richer queries needed for better performance provided more fragments for an adversary to reconstruct private facts.

The study proposes a privacy-aware reinforcement learning method, Privacy-Aware Deep Research (PA-DR), which improved strict chain success from 48.7% to 58.7% while reducing answer/full-information leakage from 34.0% to 9.9%. The approach uses a multi-hop benchmark with 1,001 chains over local enterprise documents and a controlled web corpus, split into 559 training, 98 validation, and 344 held-out-company test chains.

The benchmark measures three leakage types: intent leakage (inferring the agent’s research goals), answer leakage (enabling answers to specific private questions), and full-information leakage (allowing an observer to state verifiable private claims without prompts). The mosaic effect occurs when seemingly innocuous queries accumulate enough context for an adversary to reconstruct internal facts, such as a cloud-migration milestone or a security disclosure.

Sources
  1. 01Hugging FaceMosaicLeaks: Can your research agent keep a secret?
Also on Safety

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.