How Probably Aims for 99.99% AI Accuracy with $9M Funding

In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) have demonstrated incredible capabilities, yet one persistent challenge remains: hallucinations. These inaccuracies, where AI models confidently generate incorrect or nonsensical information, plague even the most sophisticated systems. While methods exist to mitigate these errors, the industry is still searching for a definitive solution to achieve true reliability.

Enter Probably, a new company that has just secured a significant $9 million in seed funding from Andreessen Horowitz. Their mission? To pioneer a more rigorous approach to identifying and preventing these AI errors. Founder Peter Elias envisions a future where AI systems achieve the kind of 99.99% accuracy typically found in deterministic systems, a standard that has, until now, been incredibly difficult to attain with generative AI.

Rethinking AI Accuracy: The “Data Science Mech Suit”

Achieving this unprecedented level of accuracy with LLMs requires a fundamental shift in how AI is engineered. Probably’s initial offering is a sophisticated data science tool designed to extract quick, precise answers from complex datasets. A crucial feature of this tool, and an increasingly common practice in responsible AI development, is that each result comes with a clear citation and a transparent audit trail detailing its generation process.

To ensure these summaries are free from errors, Probably has developed an elaborate “harness system” that Elias aptly likens to a “data science mech suit.” This innovative system works by having the LLM generate initial answers, which are then meticulously cross-referenced against a deterministic validator. Any result that doesn’t perfectly align with the original dataset is immediately flagged and bounced back for correction. What’s particularly clever is that the LLM itself has been specifically trained against this validator, creating a self-improving loop optimized for both speed and accuracy.

Smaller Models, Bigger Impact: Cost-Effective and Precision-Sensitive AI

The implications of Probably’s approach are far-reaching. Elias explains that the stronger the “harness engineering,” the less powerful the underlying AI model needs to be. By meticulously refining the context provided to the model, ambiguity is drastically reduced, allowing even less complex models to perform accurately. This means Probably’s data science tool can operate on significantly smaller AI models, currently running on a system “four classes weaker than the frontier models.”

This reduction in model size translates directly into substantial benefits for users. Smaller models can often be run on local hardware, like a desktop computer, rather than requiring expensive, energy-intensive data centers. This significantly cuts down on “token costs,” which are a growing concern for many businesses grappling with their AI budgets. Elias sees this innovation extending beyond data science, envisioning its application in any “precision-sensitive use case,” such as accounting or medical services.

The Future of Reliable AI: Why Big Labs Aren’t Doing This

The strategy Probably employs stands in stark contrast to the development paths taken by many larger AI labs. Elias notes, “I think it’s really interesting that the big AI labs have not even attempted to do this.” He suggests that their business models often incentivize a different approach, where repeated user interactions and corrections indirectly contribute to their revenue streams. In essence, the less reliable a model is initially, the more engagement it might generate in the long run.

Probably, however, is charting a course toward truly dependable AI, prioritizing accuracy and efficiency from the ground up. Their innovative harness system and focus on smaller, more precise models offer a compelling vision for the future of AI—one where reliability is not just an aspiration but a fundamental design principle. As AI continues to integrate into critical sectors, the demand for such robust and trustworthy systems will only grow.

Source: TechCrunch – AI

Kristine Vior

Kristine Vior

With a deep passion for the intersection of technology and digital media, Kristine leads the editorial vision of HubNextera News. Her expertise lies in deciphering technical roadmaps and translating them into comprehensive news reports for a global audience. Every article is reviewed by Kristine to ensure it meets our standards for original perspective and technical depth.

More Posts - Website

Scroll to Top