MIT method spots overconfident Artificial Intelligence models

MIT researchers developed a way to detect when large language models are confidently wrong by comparing their answers with outputs from similar models. The combined uncertainty measure outperformed standard techniques across a range of tasks and may help reduce unreliable responses.

Researchers at MIT have developed a new way to identify when large language models are overconfident, a persistent problem in systems that can produce fluent but inaccurate answers. Standard uncertainty checks often rely on asking the same model the same prompt multiple times to see whether it stays consistent. That approach captures self-confidence, but it can fail when a model repeatedly gives the same wrong answer with high certainty, creating risks in settings such as health care and finance.

The new method focuses on epistemic uncertainty, which reflects whether the chosen model is the right one for the task, rather than only how confident it sounds. To estimate that uncertainty, the researchers compare a target model’s response with answers from a small ensemble of similar models. They found that measuring semantic similarity across models gives a stronger signal than relying on one model alone. According to the team, the most effective ensemble came from models trained by different companies, because that setup produced diverse responses without being too close to the target model.

The researchers then combined this cross-model disagreement measure with a standard estimate of aleatoric uncertainty, creating a total uncertainty metric called TU. They evaluated it on 10 realistic tasks, including question-answering, summarization, translation, and math reasoning. Their method more effectively identified unreliable predictions than either measure on its own. Measuring total uncertainty often required fewer queries than calculating aleatoric uncertainty, which could reduce computational costs and save energy.

The results suggest TU can better detect hallucinations by flagging outputs that are confidently wrong, while also helping reinforce confidently correct answers during training. The experiments showed that epistemic uncertainty works especially well on tasks with a unique correct answer, such as factual question-answering, but may be less effective for more open-ended tasks. The team plans to adapt the approach for open-ended queries and explore other kinds of aleatoric uncertainty. The work was funded, in part, by the MIT-IBM Watson Artificial Intelligence Lab.

55

Impact Score

MEPs back delay for parts of Artificial Intelligence Act

European Parliament committees have endorsed targeted delays to parts of the Artificial Intelligence Act while adding a proposed ban on certain non-consensual image manipulation tools. The changes aim to give companies clearer deadlines, reduce overlap with other EU rules, and extend support to small mid-cap enterprises.

Publisher alliance seeks leverage over Artificial Intelligence web access

A new publisher coalition is trying to reshape how Artificial Intelligence companies access journalism by combining collective bargaining with tougher technical controls. The effort reflects growing pressure on Artificial Intelligence firms to pay for content used in training, search, and user-facing responses.

Military advantage in the age of algorithmic diffusion

American leadership in Artificial Intelligence research and infrastructure may not translate into lasting military advantage. Rapid diffusion of algorithms is shifting the contest toward compute, talent, and the speed of military adoption.

Artificial Intelligence adoption rises among small businesses

Small businesses are increasingly using Artificial Intelligence and reporting strong gains in efficiency, productivity, and expected revenue. Many still face practical barriers and want more training, resources, and policy support to move from experimentation to full implementation.

Corporate legal teams in 2026

In-house legal teams are being pushed beyond traditional advisory roles into strategic business functions spanning contracts, compliance, governance, and risk. Artificial Intelligence is increasingly central to that shift, especially in high-volume workflows such as contract review, due diligence, and regulatory monitoring.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.