Microsoft Research has introduced VeriTrail, a method for detecting hallucinations and providing traceability in language model-driven workflows that involve multiple generative steps. Traditional hallucination detectors typically compare a single output to its source text, an approach that falls short for complex workflows where language models generate intermediate outputs that are further synthesized into final responses. VeriTrail addresses this gap by tracing the provenance of content, allowing users to determine not only whether the final output is grounded in the source material but also to map how the output was derived through each generative stage.
The core innovation of VeriTrail lies in representing workflows as directed acyclic graphs (DAGs), where each node corresponds to pieces of text—source, intermediate, or final outputs—and each edge points from input to output. VeriTrail starts at the final output, extracts individual claims, and then verifies these claims stepwise through the antecedent nodes back to the original source material. For each verification step, the system utilizes language models in two phases: evidence selection (identifying relevant sentences from inputs) and verdict generation (assessing whether claims are fully supported, not fully supported, or inconclusive). This iterative backward tracing enables both provenance mapping for well-grounded claims and error localization for unsupported content, showing precisely where hallucinations enter the workflow.
Demonstrations on processes like GraphRAG and hierarchical summarization highlight VeriTrail’s ability to assign robust verdicts and generate an evidence trail for each claim, reducing the need to manually sift through large volumes of intermediate texts. Key design priorities include reliability, computational efficiency, and scalability: VeriTrail cross-checks returned evidence IDs to prevent hallucinated evidence, minimizes redundant node verification, and handles arbitrarily large graphs by splitting operations across multiple prompts when needed. Evaluation across datasets of fiction and news content, including DAGs with over 100,000 nodes, shows VeriTrail outperforming standard natural language inference models, retrieval-augmented generation, and long-context models. Uniquely, it offers transparent tracebacks—and when hallucinations occur, users can precisely identify which workflow stage introduced errors. The result is a method that empowers developers and users to verify, debug, and trust their artificial intelligence-driven outputs by surfacing both the lineage and reliability of each generated claim.