Apple research exposes reasoning model collapse on complex problems

A new Apple study finds that today´s reasoning-focused Artificial Intelligence models fail catastrophically when faced with sufficiently complex logic puzzles.

Apple Machine Learning Research´s latest preprint, ´The Illusion of Thinking,´ scrutinizes how so-called reasoning models—dubbed Large Reasoning Models (LRMs)—handle logical problem solving. The team designed a controlled puzzle environment to bypass industry-standard, potentially misleading benchmarks. When tested on logic puzzles of varying complexity, the plain language model outperformed reasoning-enhanced versions on simpler tasks, but as puzzle complexity increased, the more advanced models briefly surpassed their standard counterparts.

The study reveals a stark limitation: as tasks turn truly challenging, both simple and advanced models experience a dramatic plunge in accuracy and effort. These models not only fail to produce correct answers but also demonstrate reduced output, effectively abandoning attempts to solve more complex logic puzzles. Even explicit guidance—providing the models with the precise algorithm necessary for a solution—did not overcome this barrier at high complexity levels.

This work builds on previous warnings from the same research group, underscoring that current language models do not perform genuine reasoning but instead mimic patterns learned from their training data. The Apple team’s critique is echoed by other researchers, notably Subbarao Kambhampati, who argues against equating intermediate token generation with actual thinking. The consensus: marketers and benchmarks masking these weaknesses do little to change the harsh reality that neural networks, no matter how sophisticated, remain bounded by their training and lack authentic reasoning capability once confronted with new, truly arduous challenges.

72

Impact Score

EU digital omnibus on artificial intelligence: what is in it and what is not?

On November 19, 2025 the European Commission published a Digital Omnibus proposal intended to reduce administrative burdens and align rules across digital laws, including the Artificial intelligence Act. The package offers targeted simplifications but leaves several substantive industry concerns unaddressed.

Tether Data launches QVAC Fabric LLM for edge-first Artificial Intelligence inference and fine-tuning

Tether Data on December 2, 2025 released QVAC Fabric LLM, an edge-first LLM inference runtime and fine-tuning framework that runs and personalizes models on consumer GPUs, laptops, and smartphones. The open-source platform enables on-device Artificial Intelligence training and inference across iOS, Android, Windows, macOS, and Linux while avoiding cloud dependency and vendor lock-in.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.