Researchers at MIT have developed a new way to identify when large language models are overconfident, a persistent problem in systems that can produce fluent but inaccurate answers. Standard uncertainty checks often rely on asking the same model the same prompt multiple times to see whether it stays consistent. That approach captures self-confidence, but it can fail when a model repeatedly gives the same wrong answer with high certainty, creating risks in settings such as health care and finance.
The new method focuses on epistemic uncertainty, which reflects whether the chosen model is the right one for the task, rather than only how confident it sounds. To estimate that uncertainty, the researchers compare a target model’s response with answers from a small ensemble of similar models. They found that measuring semantic similarity across models gives a stronger signal than relying on one model alone. According to the team, the most effective ensemble came from models trained by different companies, because that setup produced diverse responses without being too close to the target model.
The researchers then combined this cross-model disagreement measure with a standard estimate of aleatoric uncertainty, creating a total uncertainty metric called TU. They evaluated it on 10 realistic tasks, including question-answering, summarization, translation, and math reasoning. Their method more effectively identified unreliable predictions than either measure on its own. Measuring total uncertainty often required fewer queries than calculating aleatoric uncertainty, which could reduce computational costs and save energy.
The results suggest TU can better detect hallucinations by flagging outputs that are confidently wrong, while also helping reinforce confidently correct answers during training. The experiments showed that epistemic uncertainty works especially well on tasks with a unique correct answer, such as factual question-answering, but may be less effective for more open-ended tasks. The team plans to adapt the approach for open-ended queries and explore other kinds of aleatoric uncertainty. The work was funded, in part, by the MIT-IBM Watson Artificial Intelligence Lab.
