Researchers use model signals to detect AI hallucinations

New methods analyze activations, attention patterns and output probabilities to flag hallucinations, memorized data and unreliable responses. The work points toward monitoring systems that can catch failures as models generate text.

Researchers led by Dr. Haggai Maron at the Technion, with collaborators from other universities and NVIDIA, have developed tools that inspect large language models for signs of hallucinations, memorized training data and other unreliable outputs. The approach shifts interpretability from fully explaining model behavior toward monitoring internal signals such as activations, attention maps and output probability distributions in real time.

One system, ACT-ViT, was presented at NeurIPS 2025 and analyzes activation patterns across all layers and tokens, treating them like a multidimensional grid processed by a Vision Transformer. It outperformed standard probing methods and showed strong performance when adapted to a previously unseen model while keeping the main system fixed.

A second method, CHARM, presented at ICLR 2026, represents attention patterns as graphs and uses a graph neural network to predict hallucinations at the token or response level. A third study, presented at AAAI 2026, introduced LOS-Net, which uses output probability distributions to detect hallucinations and data contamination in settings where internal model states are not available. Future work will explore combining activations, attention and output distributions into a broader monitoring system.

58

Impact Score

California lawmakers align AI safety and auditor bills

Sen. Jerry McNerney and Assemblymember Rebecca Bauer-Kahan plan a paired framework for voluntary AI standards and independent verification. The effort would create a state commission and a registry for third-party auditors.

Flexible data centers could ease grid bottlenecks

Emerald AI and partners are testing whether data centers can act as flexible grid resources rather than fixed power loads. The approach could speed interconnection, though skeptics warn it cannot replace new power and transmission.

Lexar tests SSD offloading for local AI models

Lexar is developing an AI-focused SSD approach that shifts some local model workloads from DRAM to NAND Flash. Internal tests point to lower memory requirements for running larger models on consumer PCs.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.