OpenAI reports lower hallucination rates for GPT-5

OpenAI says GPT-5 produces fewer false claims than earlier models, especially when it can browse the web. The gains look smaller without web access, underscoring how much reliability still depends on live sourcing.

OpenAI has launched GPT-5 as a faster and more capable model for ChatGPT, highlighting stronger performance across math, coding, writing, and health advice. The company also says hallucination rates have fallen versus earlier systems. GPT makes incorrect claims 9.6 percent of the time, compared to 12.9 percent for GPT-4o. According to the GPT-5 system card, the new model’s hallucination rate is 26 percent lower than GPT-4o. GPT-5 also had 44 percent fewer responses with “at least one major factual error.”

Those improvements still leave a meaningful error rate. Roughly one in 10 responses from GPT-5 could contain hallucinations, a notable concern given OpenAI’s emphasis on healthcare as a potential use case. OpenAI’s comparisons show that web access is a major factor in reducing inaccuracies. In evaluations with web browsing enabled, GPT-5: 9.6 percent GPT-5-thinking: 4.5 percent o3: 12.7 percent GPT-4o: 12.9 percent. OpenAI also tested more open-ended and complex prompts, where GPT-5 with additional reasoning performed significantly better than earlier reasoning models such as o3 and o4-mini.

The picture changes sharply when web access is removed. On OpenAI’s Simple QA benchmark, described as a set of fact-seeking questions with short answers, hallucination rates rose substantially across all models. GPT-5 main: 47 percent GPT-5-thinking: 40 percent o3: 46 percent GPT-4o: 52 percent. GPT-5 with thinking was only marginally better than o3, while the standard GPT-5 performed one percent higher than o3 and only a few percentage points below GPT-4o. The results suggest that GPT-5 is notably more dependable when it can pull from current online information rather than relying only on training data.

Early usage has also shown that lower aggregate error rates do not eliminate visible mistakes. One GPT-5 demo explaining how planes work drew criticism from Beth Barnes, founder and CEO of Artificial Intelligence research nonprofit METR, who said the model repeated a common misconception involving the Bernoulli Effect and airflow around airplane wings. The episode reinforced a broader point in OpenAI’s own data: GPT-5 appears improved, but factual accuracy still varies significantly depending on browsing access and the type of prompt.

58

Impact Score

BitUnlocker bypasses TPM-only Windows 11 BitLocker

Intrinsec disclosed BitUnlocker, a downgrade attack that can bypass TPM-only Windows 11 BitLocker protections with physical access to a machine. The technique abuses a flaw in Windows recovery and deployment components and relies on older trusted boot code.

Micron samples 256 GB DDR5 9200 MT/s RDIMM server modules

Micron has begun sampling 256 GB DDR5 RDIMM server modules built on its 1-gamma technology to key ecosystem partners. The company positions the new modules as a higher-speed, more power-efficient option for scaling next-generation Artificial Intelligence and HPC infrastructure.

Microsoft emails show early doubts about OpenAI

Court emails show Microsoft executives were unconvinced by OpenAI’s early Artificial Intelligence progress in 2018 while also worrying that rejecting the lab could push it toward Amazon. The messages reveal internal tension between skepticism over technical claims and concern about competitive and public relations fallout.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.