MIT method spots overconfident Artificial Intelligence models

MIT researchers developed a way to detect when large language models are confidently wrong by comparing their answers with outputs from similar models. The combined uncertainty measure outperformed standard techniques across a range of tasks and may help reduce unreliable responses.

Researchers at MIT have developed a new way to identify when large language models are overconfident, a persistent problem in systems that can produce fluent but inaccurate answers. Standard uncertainty checks often rely on asking the same model the same prompt multiple times to see whether it stays consistent. That approach captures self-confidence, but it can fail when a model repeatedly gives the same wrong answer with high certainty, creating risks in settings such as health care and finance.

The new method focuses on epistemic uncertainty, which reflects whether the chosen model is the right one for the task, rather than only how confident it sounds. To estimate that uncertainty, the researchers compare a target model’s response with answers from a small ensemble of similar models. They found that measuring semantic similarity across models gives a stronger signal than relying on one model alone. According to the team, the most effective ensemble came from models trained by different companies, because that setup produced diverse responses without being too close to the target model.

The researchers then combined this cross-model disagreement measure with a standard estimate of aleatoric uncertainty, creating a total uncertainty metric called TU. They evaluated it on 10 realistic tasks, including question-answering, summarization, translation, and math reasoning. Their method more effectively identified unreliable predictions than either measure on its own. Measuring total uncertainty often required fewer queries than calculating aleatoric uncertainty, which could reduce computational costs and save energy.

The results suggest TU can better detect hallucinations by flagging outputs that are confidently wrong, while also helping reinforce confidently correct answers during training. The experiments showed that epistemic uncertainty works especially well on tasks with a unique correct answer, such as factual question-answering, but may be less effective for more open-ended tasks. The team plans to adapt the approach for open-ended queries and explore other kinds of aleatoric uncertainty. The work was funded, in part, by the MIT-IBM Watson Artificial Intelligence Lab.

55

Impact Score

AMD plans specialized EPYC CPUs for Artificial Intelligence, hpc, and cloud

AMD is preparing a broader EPYC strategy with task-specific server CPUs aimed at agentic Artificial Intelligence, hpc, training and inference, and cloud deployments. The shift starts with the Zen 6 generation and adds Verano as an Artificial Intelligence-focused variant within the same EPYC family.

Nvidia expands spectrum-x ethernet with open mrc protocol

Nvidia is positioning Spectrum-X Ethernet as a foundation for large-scale Artificial Intelligence training, with Multipath Reliable Connection adding open, multi-path RDMA transport for higher resilience and throughput. OpenAI, Microsoft and Oracle are among the organizations using the technology in large Artificial Intelligence environments.

Anthropic explores Fractile chips to diversify supply

Anthropic is reportedly in early talks with London-based Fractile to secure high-performance Artificial Intelligence chips for inference workloads. The move would reduce reliance on Nvidia and broaden the company’s hardware supply chain.

OpenAI curbs odd creature references in chatbot responses

OpenAI has adjusted its models after users complained about overly familiar responses and strange references to goblins, gremlins, pigeons, and raccoons. The company traced the behavior to a retired “nerdy” personality whose habits spread into broader model training.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.