Apple research exposes reasoning model collapse on complex problems

A new Apple study finds that today´s reasoning-focused Artificial Intelligence models fail catastrophically when faced with sufficiently complex logic puzzles.

Apple Machine Learning Research´s latest preprint, ´The Illusion of Thinking,´ scrutinizes how so-called reasoning models—dubbed Large Reasoning Models (LRMs)—handle logical problem solving. The team designed a controlled puzzle environment to bypass industry-standard, potentially misleading benchmarks. When tested on logic puzzles of varying complexity, the plain language model outperformed reasoning-enhanced versions on simpler tasks, but as puzzle complexity increased, the more advanced models briefly surpassed their standard counterparts.

The study reveals a stark limitation: as tasks turn truly challenging, both simple and advanced models experience a dramatic plunge in accuracy and effort. These models not only fail to produce correct answers but also demonstrate reduced output, effectively abandoning attempts to solve more complex logic puzzles. Even explicit guidance—providing the models with the precise algorithm necessary for a solution—did not overcome this barrier at high complexity levels.

This work builds on previous warnings from the same research group, underscoring that current language models do not perform genuine reasoning but instead mimic patterns learned from their training data. The Apple team’s critique is echoed by other researchers, notably Subbarao Kambhampati, who argues against equating intermediate token generation with actual thinking. The consensus: marketers and benchmarks masking these weaknesses do little to change the harsh reality that neural networks, no matter how sophisticated, remain bounded by their training and lack authentic reasoning capability once confronted with new, truly arduous challenges.

72

Impact Score

Nvidia skips a new GeForce generation as Artificial Intelligence chips dominate

Nvidia is set to go a year without a new GeForce GPU generation for the first time since the 1990s as memory shortages and higher margins in Artificial Intelligence hardware reshape the market. AMD and Intel are also struggling to capitalize because the same supply constraints are hitting gaming products across the industry.

Where gpu debt starts to break

Stress in gpu-backed infrastructure financing is emerging around deals that lack the structural protections seen in the strongest transactions. Oracle, the Abilene Stargate project, and older CoreWeave debt illustrate different ways residual risk can surface when contracts, collateral, and counterparties fall short.

SK hynix starts mass production of 192 GB SOCAMM2

SK hynix has begun mass production of the 192 GB SOCAMM2, a next-generation memory module standard built on 1cnm LPDDR5X low-power DRAM. The module is positioned as a primary memory solution for next-generation Artificial Intelligence servers.

AMD taps GlobalFoundries for co-packaged optics in Instinct MI500

AMD is preparing a renewed manufacturing link with GlobalFoundries to bring co-packaged optics to its Instinct MI500 Artificial Intelligence accelerators. The move is aimed at improving bandwidth and power efficiency in data center systems by moving beyond copper-based interconnects.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.