
Anthropic has just unveiled two significant additions to its AI model lineup: Claude Fable 5 and Claude Mythos 5. These new models promise enhanced capabilities, building upon the foundations laid by the previously released Mythos Preview. The company has acknowledged the powerful, and potentially risky, nature of these advanced AIs, particularly concerning their ability to discover software vulnerabilities.
The initial release of Mythos Preview in April was strictly limited to a select group of tech industry partners, a decision driven by Anthropic’s concerns about potential misuse by malicious actors. This cautious approach continues with the latest models, as Anthropic navigates the complex landscape of AI safety and accessibility.
Introducing Claude Fable 5 and Mythos 5
Claude Mythos 5 is currently accessible only to a restricted group of industry partners, many of whom were part of the Mythos Preview program. Anthropic is working closely with the US government on this controlled rollout, emphasizing a collaborative effort to ensure responsible deployment. This limited access underscores the company’s commitment to mitigating risks associated with powerful AI.
In contrast, Claude Fable 5 is being publicly released, leveraging the same underlying model as Mythos 5. However, it comes with built-in “guardrails” designed to prevent its misuse. These safeguards will automatically reroute user queries related to cybersecurity, biology, and chemistry to an older, less capable model, Claude Opus 4.8, if they are deemed sensitive or potentially harmful.
Anthropic also plans to detect and reroute requests suspected of attempting “distillation”—the process of training a smaller AI model using responses from a larger one—back to Claude Opus 4.8. This measure aims to protect the integrity and proprietary nature of their advanced models. This careful balancing act highlights the ongoing challenge of making powerful AI widely available while maintaining safety.
Navigating Advanced Capabilities and Safety
Diane Penn, Anthropic’s head of Product Management, revealed in an interview with WIRED that the company has been grappling with the implications of Mythos’s advanced capabilities, particularly its software vulnerability-discovery abilities. Extensive testing and user feedback since the April release have been crucial in refining their deployment strategy. Penn stated, “Out of all the different approaches, this emerged as the most viable and the best one.”
For now, the protective mechanisms are designed to err on the side of caution. This means some benign user queries might be inadvertently rerouted to the older AI model. While Anthropic hopes to enhance the precision of its classifiers over time, this cautious approach was deemed the safest way to broadly release the model at this juncture.
In addition to Project Glasswing partners, access to Claude Mythos 5 is also being granted to “select biology researchers.” Anthropic’s blog post hinted at future plans to expand access even further, noting that these small groups receive unrestricted versions “until our trusted access program is available.” This suggests a phased rollout, prioritizing safety and responsible usage at each step.
Project Glasswing and the Cost of Innovation
The ability of advanced AI models like Claude Mythos to design hacking tools and uncover software vulnerabilities has compelled tech companies and governments to bolster their cybersecurity defenses. Project Glasswing, under which Mythos was initially released to industry partners, was conceived to give members a crucial head start. This consortium aims to help prepare systems and develop global solutions against potential threats before widespread AI availability.
Anthropic recently updated Project Glasswing, stating, “We’re working as quickly as we can to safely release Mythos-level capabilities in general access.” The company acknowledges the need for “highly robust safeguards” to prevent misuse, which, to their knowledge, have yet to be fully developed by any AI provider. This transparency underscores the industry-wide challenge of securing such powerful technology.
Claude Fable 5, named alongside existing models like Haiku, Sonnet, and Opus, boasts increased performance in software engineering and visual understanding tasks. However, this enhanced capability comes with a higher price point. Developers will incur costs of $10 per million input tokens and $50 per million output tokens, which is double the cost of Anthropic’s other publicly available models, though cheaper than Mythos Preview.
The carefully guarded release of Claude Fable 5 highlights Anthropic’s business dilemma: the desire to make Mythos-class AI available for general use, balanced against unresolved cybersecurity concerns. OpenAI has also taken similar steps, privately launching an advanced cybersecurity model and forming a working group. Both companies, rumored to be filing for IPOs, are in a race to impress investors while navigating the complexities of AI safety and public access.
Despite more than 1,000 hours of red-teaming revealing no universal “jailbreaks” for the model, concerns about adequate protection persist. Whether Claude Fable 5’s safeguards will hold up in real-world scenarios remains to be seen. These lingering fears are precisely why Anthropic initially refrained from a broad public release of Mythos-class models, and they continue to shape the company’s strategy.
Source: Wired – AI