Why extended Artificial Intelligence reasoning may be wasted spend

Research and practical testing suggest many reasoning models generate long chains of thought that do not materially improve answers on routine tasks. That could mean much of the cost of premium Artificial Intelligence usage goes toward visible and invisible performance rather than better results.

A practical test inside a personal CRM raised doubts about the value of premium reasoning models for everyday work. The system analyzes emails, meeting notes, public information, and relationship context across a large contact base, but the most capable models were reserved for the most important cases because of cost. With over 800 active contacts and the need for cross-contact analysis, the number of LLM queries this app generates is quite significant. During one session, 47 premium contacts were updating at once, each using extended reasoning. After several weeks of use, the outputs from the expensive model and the cheaper one appeared effectively the same in quality, while the premium option was slower and more costly.

That experience aligns with recent research questioning whether visible and hidden reasoning steps are actually doing useful work. A paper by Basu and Chakraborty tested 10 frontier models, including GPT-5.4, Claude Opus, and DeepSeek, across four task types. Their method removed one reasoning step at a time and checked whether the final answer changed. For most models on most tasks, removing any single step changed the answer less than 17% of the time. The implication was that no single step was individually necessary, even when the reasoning looked coherent and persuasive.

Separate research from Goodfire AI and Harvard examined when models had effectively already decided on an answer before finishing their reasoning. On straightforward questions, internal confidence converged on the correct answer very early, yet the models continued generating additional reasoning tokens. When the researchers forced the model to stop once it had already made up its mind, token use dropped by up to 80%, while accuracy remained comparable. That finding suggests a large share of reasoning output on routine tasks may be decorative rather than functional.

The financial consequences could be significant because reasoning tokens are billed as output, the most expensive token category, and models can generate thousands of them before producing a short visible response. If 80% of those tokens on routine tasks are performative, as the Goodfire research suggests, then much of the cost of everyday Artificial Intelligence use may come from unnecessary computation. The recommended response is to test common tasks side by side on smaller or non-reasoning models and identify which queries truly need extended reasoning. Teams that cannot answer that question may be overspending on model behavior that looks impressive but does not improve results.

55

Impact Score

Tencent WeKnora expands document retrieval and agent features

Tencent’s WeKnora is an open source framework for deep document understanding, semantic retrieval, and context-aware answers built on the Retrieval-Augmented Generation paradigm. Recent updates add new messaging integrations, model providers, storage and vector database options, and stronger security controls.

Judge temporarily blocks Pentagon action against Anthropic

A federal judge temporarily barred the Pentagon from labeling Anthropic a supply chain risk and blocked enforcement of a presidential directive telling agencies to stop using the company’s chatbot Claude. The ruling found the government’s measures appeared punitive and likely unlawful.

DRAM stocks fall after Google TurboQuant debut

DRAM manufacturers came under pressure after Google introduced TurboQuant, which it says can sharply reduce the memory needs of Artificial Intelligence models while speeding up inference. The announcement coincided with notable declines in shares of Micron, SK Hynix, and Samsung Electronics.

Nature paper details the Artificial Intelligence scientist project

Sakana Artificial Intelligence and academic collaborators have published a Nature paper describing The Artificial Intelligence Scientist, a system designed to automate the full machine learning research lifecycle. The work reports peer review results, reviewer benchmarking, and limits that still constrain the system.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.