OpenAI´s Own Data Shows ChatGPT Hallucination Problem Worsens in Latest Releases

OpenAI´s research reveals that newer ChatGPT models are generating false information more often, raising new concerns about reliability in advanced Artificial Intelligence systems.

OpenAI´s internal tests show its newest large language models, particularly GPT o3 and GPT o4-mini, are significantly more prone to hallucinating—generating false information—than prior versions. The company found that GPT o3, touted as its most powerful model yet, produced hallucinations 33% of the time when answering questions about public figures using the PersonQA benchmark, double the rate of its predecessor, GPT o1. The GPT o4-mini fared even worse, with a hallucination rate of 48%. On more general queries tested by the SimpleQA benchmark, those rates climbed to 51% for o3 and an alarming 79% for o4-mini, with o1 previously at 44%.

This trend is especially perplexing given the growing focus on so-called ´reasoning´ models. Such models are designed to break tasks into logical steps to achieve human-like problem-solving capabilities. OpenAI, alongside rivals like Google and DeepSeek, has championed these advancements as part of the next leap in Artificial Intelligence, claiming earlier models like o1 could outperform PhD students in certain academic fields. However, new findings suggest that complexity and improved reasoning may actually introduce more avenues for error, leading to increased hallucinations, contrary to industry hopes for greater reliability.

OpenAI acknowledges the worsening issue but disputes that reasoning models are inherently more prone to error, stating that research is ongoing to understand and mitigate the problem. Regardless, continued high rates of hallucination threaten the usefulness of large language models in real-world applications, especially where the main advantage is supposed to be saving time and labor. If outputs require meticulous double-checking, the incentive to use such technology diminishes. The current challenge for OpenAI and the wider Artificial Intelligence sector remains clear: without addressing this ´robot dreams´ problem, trust in these systems will be difficult to establish.

75

Impact Score

BitUnlocker bypasses TPM-only Windows 11 BitLocker

Intrinsec disclosed BitUnlocker, a downgrade attack that can bypass TPM-only Windows 11 BitLocker protections with physical access to a machine. The technique abuses a flaw in Windows recovery and deployment components and relies on older trusted boot code.

Micron samples 256 GB DDR5 9200 MT/s RDIMM server modules

Micron has begun sampling 256 GB DDR5 RDIMM server modules built on its 1-gamma technology to key ecosystem partners. The company positions the new modules as a higher-speed, more power-efficient option for scaling next-generation Artificial Intelligence and HPC infrastructure.

Microsoft emails show early doubts about OpenAI

Court emails show Microsoft executives were unconvinced by OpenAI’s early Artificial Intelligence progress in 2018 while also worrying that rejecting the lab could push it toward Amazon. The messages reveal internal tension between skepticism over technical claims and concern about competitive and public relations fallout.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.