Subquadratic has released more evidence for SubQ, a new LLM that it says avoids a long-running bottleneck in transformer-based systems. The Miami startup claims SubQ is faster, cheaper and less energy-intensive than existing models, while processing up to 12 times as much text at once and matching top systems from Google DeepMind, OpenAI and Anthropic on key tasks such as coding.
The company’s approach replaces dense attention, which compares every token with every other token, with sparse attention that selects only some relationships to compute. Subquadratic says its method chooses those relationships dynamically for each input, rather than relying on fixed patterns. Appen, a third-party evaluation firm, found that SubQ was 56 times faster than models using FlashAttention in a speed test and scored 89.7% on LiveCodeBench.
Cost and broader usefulness remain harder to verify because SubQ is not widely available. Dangel said it costs $2600 to run Anthropic’s LLM Opus 4.6 through RULER 128, while SubQ cost eight dollars. Appen also reported that SubQ scored 98% on a needle-in-a-haystack retrieval test with context windows six million and 12 million tokens long.
Skepticism persists because benchmarks do not capture real-world performance, few users have tested the model, and Subquadratic reused weights from a version of the open-source Qwen model rather than training SubQ from scratch. The company says more than 500 enterprise customers are among those seeking early access.
