LLMs Just Got 12x Faster & Cheaper: Subquadratic’s Breakthrough

LLMs Just Got 12x Faster & Cheaper: Subquadratic's Breakthrough

Miami-based AI startup Subquadratic recently emerged from stealth mode with a bold declaration: they’ve apparently cracked a long-standing mathematical challenge that has been bottlenecking large language models (LLMs) for nearly a decade. While initial details were scarce and many remained unconvinced, the company is now starting to back its claims with solid evidence.

Subquadratic has unveiled the results of an independent evaluation of its new technology, and the findings suggest that their claims might be more than just hype. The company posits that its novel LLM, dubbed SubQ, is significantly faster, cheaper, and consumes far less energy than any other model currently available. They also state that SubQ can process up to 12 times more text simultaneously than most competitors, enabling it to tackle data-heavy tasks like analyzing hundreds of documents or entire codebases with ease.

SubQ: A Game-Changer or Hype?

Adding to these impressive assertions, Subquadratic claims SubQ performs on par with top models from industry giants like Google DeepMind, OpenAI, and Anthropic, particularly in crucial tasks like coding. However, the initial lack of concrete evidence beyond self-published test scores and the model’s limited public availability naturally led to widespread skepticism.

The sentiment was perhaps best captured by AI engineer Dan McAteer, who posted on X: “SubQ is either the biggest breakthrough since the Transformer … or it’s AI Theranos.” A month later, Subquadratic has provided more comprehensive information, including the crucial independent test results from the third-party firm Appen.

“We expected healthy skepticism,” admitted Alex Whedon, Subquadratic’s cofounder and CTO. He reflected that releasing these third-party benchmarks alongside the initial announcement would have addressed much of the early doubt. This is why the company is committed to ensuring all future results are thoroughly verified before publication.

Independent Validation Brings Credibility

Appen, a company renowned for evaluating other firms’ AI models, conducted tests on SubQ, and their findings largely corroborate Subquadratic’s claims. Jeanine Sinanan-Singh, Appen’s director of generative AI research, expressed her excitement, stating, “That was really exciting to me, it validated their architecture.”

Sinanan-Singh added, “I was like, ‘Wow, this could be a game changer,’ because models struggle with speed and inefficiency. But when you have kind of shocking results, it’s really not as credible when you say it yourself.” While SubQ may not completely displace existing top models across all applications, it promises substantial speed increases at a fraction of the typical cost for specific tasks.

Subquadratic maintains that its breakthrough could ultimately reshape how LLMs are constructed. “We hope we’re kicking off a new age of efficiency,” said Justin Dangel, the firm’s cofounder and CEO, confidently asserting, “We don’t think anybody will be building on transformers in a few years.”

The Transformer Bottleneck and Subquadratic’s Solution

To fully grasp the significance of Subquadratic’s claims, it’s essential to understand the inner workings of most current LLMs. The foundational mechanism in these models is a type of neural network called a transformer, which relies on a process known as dense attention. Modern LLMs typically string together multiple transformers, a concept rooted in Google’s influential 2017 paper, “Attention Is All You Need.”

Dense attention functions by first encoding each word (or token) in a text chunk with a numerical value. To grasp the text’s overall meaning, it then multiplies each of these numbers by every other number within that text. For instance, processing a 10,000-word text would involve close to 50 million individual multiplications, a staggering computational load that makes LLMs notorious energy consumers.

As Dangel explained, “If you want to summarize The Great Gatsby, you have to look at the first word and the last word together, and then you have to look at every other combination.” The computational demands escalate dramatically with text length because each new number must be multiplied by all preceding numbers. Doubling the word count roughly quadruples the computations, a growth rate known as quadratic expansion.

Subquadratic’s innovation lies in abandoning dense attention, the core of the transformer, in favor of sparse attention. This method drastically reduces the required computations by selecting only a subset of numbers for multiplication, based on the premise that not all word relationships in a text are equally vital. “Sparse attention says not all of those relationships are important, because they’re not,” Whedon elaborated.

While sparse attention isn’t a new concept, previous attempts haven’t managed to capture document meaning as effectively as dense attention. Will Depue, an independent AI researcher and former OpenAI employee, likened the challenge to “running a four-minute mile.” Subquadratic, however, claims to have finally cracked this problem, positioning SubQ as the first sparse-attention LLM to match the performance of mainstream dense-attention models.

Early Results and Future Prospects

Subquadratic’s “secret sauce” lies in its dynamic selection process, where the model decides on the fly which words to focus on, adapting to each unique piece of text. Whedon clarified that “Historically, most mechanisms have used fixed patterns, like always comparing the first word to the fifth. That’s pretty limiting. Language is too sophisticated for that.”

This approach allows SubQ to be faster and more cost-effective for certain tasks. Appen’s evaluation revealed that in a pure speed test, SubQ was an astonishing 56 times faster than models using FlashAttention, another sparse-attention technique. On LiveCodeBench, a benchmark for competitive coding problems, SubQ achieved an 89.7% score, placing it among top coding models. Sinanan-Singh noted, “This model continues to provide frontier-level performance in coding.”

Cost claims are harder to verify without broad access, but Dangel offered a compelling comparison: running Anthropic’s LLM Opus 4.6 through Nvidia’s RULER 128 test (for retrieving information from large datasets) costs around $2,600, whereas SubQ performed the same task for a mere $8. SubQ also boasts an impressive context window of up to 12 million tokens, compared to the typical one million for most top models. In a demonstration, SubQ analyzed information from 400 documents in seconds, a task that overwhelmed a popular LLM-powered search engine.

The “needle-in-a-haystack” test further demonstrated SubQ’s prowess in retrieving specific information from vast datasets. Appen reported that SubQ scored 98% with context windows of six and 12 million tokens, “sustaining near-perfect long-context retrieval at scales few models are tested at.” Despite these high scores, benchmarks offer an incomplete picture. Real-world applications across diverse tasks are still needed.

Subquadratic is positioning SubQ for coding and large-scale data searching. Tens of thousands have already signed up for early access, including over 500 enterprise customers, though access remains limited due to the company’s nascent status. While some skepticism persists, especially regarding the model’s use of weights from a version of the Chinese open-source model Qwen rather than training from scratch, Subquadratic maintains its core innovation. As Depue acknowledges, “They may have built something real and useful, but the public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck.”

Source: MIT Tech Review – AI

Kristine Vior

Kristine Vior

With a deep passion for the intersection of technology and digital media, Kristine leads the editorial vision of HubNextera News. Her expertise lies in deciphering technical roadmaps and translating them into comprehensive news reports for a global audience. Every article is reviewed by Kristine to ensure it meets our standards for original perspective and technical depth.

More Posts - Website

Scroll to Top