SubQ LLM Breakthrough: Subquadratic Attention Promises Unprecedented Context and Efficiency
Alexander Whedon has announced SubQ (subquadratic), a new large language model (LLM) architecture claiming a major breakthrough in LLM intelligence and long-context task performance. SubQ proposes a novel approach that maintains intelligence comparable to frontier models like Opus 4.7 and GPT-5.5 while dramatically improving efficiency. The core innovation is its “content-dependent selection” sparse attention mechanism, which contrasts with the quadratic complexity of dense attention in current models. This method reportedly achieves significantly faster inference on 1 million-token context tasks and reduces costs to just 5% of Opus. SubQ’s initial model is promised to feature an unprecedented 12 million-token context window, enabling the processing of entire codebases or multiple large legal documents, potentially rendering existing workarounds like RAG and sub-agents obsolete.
While the claims are substantial, publicly available technical details and comprehensive benchmarks are limited. Initial results from the RULER benchmark show SubQ performing on par with Opus 4.6 for retrieval and reasoning within a 128,000-token context. In the MRCR v2 long-context retrieval benchmark, SubQ is positioned in the range of Opus 4.6, though noted as “definitely worse” in some aspects, yet superior to Gemini 3.1 Pro and Opus 4.7 in the same table. A software engineering benchmark also indicates its capability to generate meaningful code from long contexts. The absence of a model card or deep technical dive into the precise workings of the content-dependent selection mechanism fosters a cautious industry sentiment. If SubQ’s ambitious claims materialize, it holds the potential to address current compute constraints and unlock new AI use cases previously limited by context window sizes and quadratic scaling.