MosaicLeaks Reveals How AI Agents Leak Company Secrets

Imagine your company’s AI research agent diligently sifting through internal documents and external web searches to answer a critical question. Sounds efficient, right? But what if those seemingly innocent web queries inadvertently piece together sensitive, private information that anyone observing the outbound traffic could discover? This hidden danger, where fragments of data combine to reveal a full picture, is precisely what the new MosaicLeaks research highlights.

A recent study, “MosaicLeaks: Privacy Risks in Querying-in-the-Open for Deep Research Agents,” by Alexander Gurung, Rafael Pardinas, and their team from ServiceNow, uncovers a critical vulnerability in how deep research agents handle proprietary data. It shows that current AI agents, even when designed for efficiency, often struggle to keep enterprise secrets confidential. This groundbreaking work introduces a novel framework and a solution to mitigate these serious privacy risks.

Understanding the Mosaic Effect: How AI Leaks Secrets

The core problem lies in what researchers call the “mosaic effect.” Picture an AI agent at a healthcare firm trying to understand a cloud migration project. It might issue several web queries: one about a specific migration milestone, another about a security disclosure date, and a third to identify a vendor. Individually, these queries seem harmless, but collectively, they can betray sensitive internal information.

An outsider monitoring the agent’s web traffic could potentially reassemble these fragments. For example, they might deduce that “MediConn had migrated 70% of its infrastructure to the cloud by January 2025,” a fact previously confined to private company documents. The adversary doesn’t need access to internal files or the agent’s reasoning; the cumulative query log alone provides enough clues.

MosaicLeaks quantifies this leakage in three critical ways, representing increasing levels of privacy concern:

Intent Leakage: Reveals what the agent is investigating, giving away the research topic.
Answer Leakage: Means the query log contains enough information to answer a specific private question.
Full-Information Leakage: The most severe form, where an observer can proactively discover and state private facts without being prompted.

This nuanced approach allows us to see not just if information is leaked, but what *kind* of information is exposed.

The Challenge: Performance vs. Privacy

To rigorously test these risks, the MosaicLeaks benchmark was created. It features 1,001 multi-hop research chains combining local enterprise documents with a controlled web corpus. The tasks are specifically designed to induce privacy leakage while still being solvable without exposure, forcing agents to retrieve local information before formulating relevant web queries.

The initial findings were stark: current AI agents frequently leaked private information. Alarmingly, when models were trained solely to improve task performance, leakage didn’t just persist—it worsened. A model trained for better task completion saw its answer/full-information leakage climb from 34.0% to 51.7%, even as its strict chain success rose from 48.7% to 59.3%.

The reason? More efficient agents often pack more context into their web queries, which helps them find the right public documents. However, this richer context inadvertently provides more fragments for an observer to piece together. The study also tested a seemingly obvious solution: simply telling the agent in its prompt, “Don’t issue web queries that leak local information.” Unfortunately, this “prompting for privacy” proved inconsistent and largely ineffective, often reducing task performance without significantly curbing leakage.

Introducing PA-DR: Training AI for Privacy

Recognizing this fundamental tension, the researchers developed Privacy-Aware Deep Research (PA-DR), a new reinforcement learning training method that prioritizes both performance and privacy. PA-DR employs two innovative reward mechanisms:

Situational Task Reward: Instead of scoring an entire research trajectory at the end, PA-DR judges each individual step (like planning a search or choosing a document) against other actions taken at the same stage. This provides precise, immediate feedback, reinforcing good decisions and discouraging poor ones.
Learned Privacy Reward: Whenever the agent generates web queries, a specialized classifier estimates the risk of leakage—both direct exposure and contributions to a mosaic leak. PA-DR then penalizes the exact planning decision that increases this privacy risk.

The results from PA-DR are truly impressive. It boosted strict chain success from 48.7% to 58.7% while dramatically reducing answer/full-information leakage from 34.0% to a mere 9.9%. This means PA-DR not only maintains performance gains but also achieves a leakage rate significantly lower than the *untrained* base model.

Crucially, PA-DR doesn’t achieve this by simply making agents search less. Instead, it teaches them to search smarter and safer. The agents still issue a similar number of web queries, but they learn to omit revealing private details, such as specific metrics or sensitive dates, from their query text. This allows them to still find the necessary public documents without inadvertently exposing internal facts.

The Bottom Line: Train, Don’t Just Prompt

While MosaicLeaks provides a controlled benchmark and not a direct measurement of every deployed system, its core message is crystal clear: you can’t simply prompt privacy into an AI agent; you have to train it in. Telling an agent to be careful barely makes a difference. However, by carefully measuring and rewarding how an agent constructs each query, leakage can be cut by more than three times, all while maintaining essential task success.

The “mosaic effect” is a subtle but significant threat arising from how AI agents interact with information over time. The good news is that this behavior can be measured, attributed to specific decisions, and effectively mitigated through intelligent training methods like PA-DR. This work marks a crucial step forward in developing truly secure and trustworthy deep research agents for the enterprise.

Source: Hugging Face Blog

Kristine Vior

With a deep passion for the intersection of technology and digital media, Kristine leads the editorial vision of HubNextera News. Her expertise lies in deciphering technical roadmaps and translating them into comprehensive news reports for a global audience. Every article is reviewed by Kristine to ensure it meets our standards for original perspective and technical depth.

Understanding the Mosaic Effect: How AI Leaks Secrets

The Challenge: Performance vs. Privacy

Introducing PA-DR: Training AI for Privacy

The Bottom Line: Train, Don’t Just Prompt

Kristine Vior

Related Posts