Startup claims sparse-attention LLM rivals top dense models on coding benchmarks
Subquadratic’s SubQ model touts up to 56x speed gains and 12x context length vs. mainstream LLMs, backed by third-party tests.
1 source · cross-referenced
- Subquadratic, a Miami-based AI startup, claims its sparse-attention LLM SubQ matches top coding performance while using far less compute.
- Third-party tests by Appen found SubQ 56x faster than models using FlashAttention in a theoretical speed test and scored 89.7% on LiveCodeBench, comparable to leading models.
- SubQ processes up to 12x more text at once than most models, according to the company, enabling large-scale document and codebase analysis.
- The startup has released limited third-party benchmarks after initial skepticism; experts say more independent validation is needed.
Subquadratic, a Miami-based AI startup, says it has developed a large language model that rivals top dense-attention models on coding tasks while using far less compute. The company’s model, SubQ, uses a sparse-attention mechanism it claims solves a long-standing bottleneck in LLM efficiency. Subquadratic emerged from stealth in May 2026 with a bold claim: it had overcome a mathematical inefficiency that has constrained LLMs for nearly a decade.
After initial pushback over a lack of evidence, Subquadratic has begun releasing third-party test results. Appen, a generative AI evaluation firm, ran a suite of tests on SubQ, including a theoretical speed benchmark and LiveCodeBench, a competitive coding benchmark drawn from real contests. In the speed test, SubQ was 56 times faster than models using FlashAttention, a prior sparse-attention technique. On LiveCodeBench, SubQ scored 89.7%, placing it in the same range as leading coding models from Google DeepMind, OpenAI, and Anthropic.
The company also asserts that SubQ can process up to 12 times as much text at once compared with mainstream models, enabling tasks like analyzing entire codebases or large document collections. Subquadratic cofounder and CTO Alex Whedon described the approach as selecting only the most meaningful token relationships dynamically, rather than computing all pairwise interactions as in dense attention. “Sparse attention says not all of those relationships are important,” Whedon said. “If you’re reading a book, you’re not going to look at the first and second words, first and third—that’s insane.”
Subquadratic did not disclose the exact method for selecting which tokens to attend to, calling it the “secret sauce,” but emphasized that the selection is computed on the fly and varies by input. The firm’s CEO, Justin Dangel, framed the breakthrough as potentially ushering in a new era of efficiency: “We hope we’re kicking off a new age of efficiency. We don’t think anybody will be building on transformers in a few years.”
Industry observers remain cautious. Independent AI researcher Will Depue, a former OpenAI employee, characterized the challenge as akin to “running a four-minute mile,” noting that many have attempted sparse attention without achieving parity with dense models. Appen’s director of generative AI research, Jeanine Sinanan-Singh, called the results “exciting” and “validating,” but added that surprising results require independent verification. Subquadratic has not yet made SubQ widely available for public testing, which limits external scrutiny of its claims.
- Jun 19, 2026 · Simon Willison — everything
Z.ai releases GLM-5.2, a 753B-parameter open-weights text-only LLM with 1M token context
Trust79 - Jun 19, 2026 · arXiv cs.CL
DeepSeek-V4 series introduces two MoE models with million-token context support and efficiency gains
Trust79 - Jun 18, 2026 · OpenAI — News
OpenAI adds spend controls and usage analytics for ChatGPT Enterprise
Trust74