The tech world recently buzzed with controversy surrounding Anthropic’s new AI model, Fable 5. This powerful tool, a scaled-down version of the formidable Mythos-class AI, promised users access to cutting-edge capabilities. However, a hidden feature—or rather, a lack of transparency—sparked a significant backlash, leading many to question Anthropic’s approach to AI safety and user trust.
Fable 5 was unveiled as a publicly accessible version of Mythos, an AI developed through Project Glasswing, a collaborative effort to fortify internet infrastructure against vulnerabilities. While Mythos itself was restricted to a select few organizations due to its immense power to both find and exploit flaws, Fable 5 was designed for wider use, albeit with clear safeguards against misuse in areas like bioweapons development.
The Hidden Downgrade That Rocked the AI Community
Anthropic explicitly stated that Fable 5 would not support certain high-risk research areas in cybersecurity, biology, and chemistry. When users attempted to engage in these prohibited activities, the model would visibly downgrade to Opus-level intelligence, informing them of the change. This transparency was crucial, as users understood they were operating within defined safety parameters.
However, a different scenario unfolded for researchers working on other sensitive, yet legitimate, projects, such as super-powerful chip designs or advanced AI large language models. In these cases, Fable 5 silently downgraded to Opus without any user notification. This meant researchers were unknowingly testing and receiving results from a less capable model, believing they were leveraging Fable 5’s full potential.
The only hint of this silent downgrade was buried deep within the lengthy 319-page Fable and Mythos System Card, stating that this specific behavior would not be visible to users. For the vast majority who wouldn’t pore over every page of such a document, this vital information remained hidden. This lack of visible feedback led to widespread frustration and accusations of “secret sabotage” from publications like Fortune and Wired.
Expert Opinions on Fable 5’s Safeguards
Rob T. Lee, Chief AI Officer at SANS Institute, offered a nuanced perspective on Fable 5’s design, calling it “a novel solution, and a smart one.” He acknowledged its potential to deter malicious actors but warned that the same protective layers could inadvertently impede legitimate defensive research. Lee shared his own experience, noting that his attempts to build digital forensics skills were downgraded to Opus 4.8, preventing him from developing new defensive capabilities.
Lee also raised concerns about the human factor, highlighting that even restricted access to powerful models like Mythos could be compromised. He emphasized that thousands of employees across partner organizations present potential vulnerabilities, where insider threats or foreign adversaries could gain access. This underscores that technological safeguards, while important, are not foolproof against determined human actors.
Ashley Casovan, managing director of IAPP’s AI Governance Center, commended Anthropic for holding back Mythos to implement necessary guardrails. Conversely, Chris Boehm of Zero Networks viewed Fable 5’s release as an accomplishment of restraint, turning raw power into something safe enough for wider deployment. This allows ordinary defenders to operate at attacker speed, provided the safeguards prove robust.
Anthropic’s Response and Ongoing Challenges
Facing significant public pressure, Anthropic swiftly responded, apologizing for “not getting the balance right” and announcing a crucial change: all flagged requests for frontier LLM development will now visibly fall back to Opus 4.8. Additionally, API requests will return a clear reason for refusal, ensuring full transparency for users. The company explained that its initial hidden approach was an attempt to make safeguards harder to circumvent, allowing for more narrow targeting, but clearly, this proved ineffective.
Anthropic also reiterated its reasoning for these safeguards, stating they prevent “foreign adversaries from using our most capable models in ways that pose severe safety risks” and “erode [the US and allies’] edge in frontier chips and the highly optimized software that runs them.” While this stance is firm, experts like Etay Maor from Cato Networks warn that highly motivated attackers will likely find ways around restrictions, focusing on context manipulation or capability distillation if direct exploitation is blocked.
Another concern remains the issue of false positives, where legitimate requests might be incorrectly flagged. Anthropic acknowledges this, noting that while current classifiers affect a tiny fraction of tasks and organizations, visible safeguards might cast a wider net. This delicate balance between security and usability continues to be a complex technical challenge for AI developers.
Data Retention Policies Add to the Debate
Adding another layer to the discussion, Anthropic’s 30-day data retention policy for Fable and Mythos-class models has raised eyebrows, particularly within regulated industries. Unlike many of its other products, which offer zero-data-retention agreements, these specific models require data retention to enable their critical safety classifiers.
This policy reportedly prompted Microsoft to limit employee use and engage legal teams to evaluate the implications. While not a new issue specific to Fable, its emergence in the news alongside the downgrade controversy highlights the growing importance of understanding data governance in advanced AI deployments. For enterprises, particularly in sensitive sectors, scrutinizing these policies to ensure compliance with legal and regulatory requirements is paramount before integrating such powerful AI tools.
Source: ZDNet – AI