VeriTrail advances hallucination detection and traceability in multi-step artificial intelligence workflows

Microsoft Research introduces VeriTrail, a system that detects unsupported content in multi-step Artificial Intelligence processes and pinpoints error origins, enhancing trust and transparency in language model workflows.

Microsoft Research has introduced VeriTrail, a method for detecting hallucinations and providing traceability in language model-driven workflows that involve multiple generative steps. Traditional hallucination detectors typically compare a single output to its source text, an approach that falls short for complex workflows where language models generate intermediate outputs that are further synthesized into final responses. VeriTrail addresses this gap by tracing the provenance of content, allowing users to determine not only whether the final output is grounded in the source material but also to map how the output was derived through each generative stage.

The core innovation of VeriTrail lies in representing workflows as directed acyclic graphs (DAGs), where each node corresponds to pieces of text—source, intermediate, or final outputs—and each edge points from input to output. VeriTrail starts at the final output, extracts individual claims, and then verifies these claims stepwise through the antecedent nodes back to the original source material. For each verification step, the system utilizes language models in two phases: evidence selection (identifying relevant sentences from inputs) and verdict generation (assessing whether claims are fully supported, not fully supported, or inconclusive). This iterative backward tracing enables both provenance mapping for well-grounded claims and error localization for unsupported content, showing precisely where hallucinations enter the workflow.

Demonstrations on processes like GraphRAG and hierarchical summarization highlight VeriTrail’s ability to assign robust verdicts and generate an evidence trail for each claim, reducing the need to manually sift through large volumes of intermediate texts. Key design priorities include reliability, computational efficiency, and scalability: VeriTrail cross-checks returned evidence IDs to prevent hallucinated evidence, minimizes redundant node verification, and handles arbitrarily large graphs by splitting operations across multiple prompts when needed. Evaluation across datasets of fiction and news content, including DAGs with over 100,000 nodes, shows VeriTrail outperforming standard natural language inference models, retrieval-augmented generation, and long-context models. Uniquely, it offers transparent tracebacks—and when hallucinations occur, users can precisely identify which workflow stage introduced errors. The result is a method that empowers developers and users to verify, debug, and trust their artificial intelligence-driven outputs by surfacing both the lineage and reliability of each generated claim.

79

Impact Score

Intel nears 1.6 billion artificial intelligence chip deal for SambaNova

Intel is in late-stage talks to acquire artificial intelligence chip startup SambaNova Systems Inc. for roughly 1.6 billion including debt, a move that would expand its position in high-performance artificial intelligence hardware and reflect sharply lower sector valuations since 2021.

Semiconductors and Artificial Intelligence chips weekly briefing for December 12, 2025

The latest semiconductor and Artificial Intelligence chip developments span relaxed Nvidia export rules to China, massive potential TSMC investments in the US, and new product launches from Samsung and AMD alongside strategic deals by Broadcom, Marvell, and Qualcomm. Geopolitics, national security, and data center scaling remain central themes across the industry.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.