The recent wave of mistakes in court filings has exposed how fallible both generative tools and human review can be. Lawyers at prominent firms submitted motions citing cases that did not exist, a Stanford professor filed sworn testimony containing hallucinations, and judges have been forced to reprimand parties or withdraw orders after errors surfaced. At the same time, many on the bench are trying out Artificial Intelligence themselves, hoping to unclog backlogs by delegating rote work such as summarizing materials, creating timelines, and drafting routine orders.
Those early experiments have produced a split portrait. Some judges, like Xavier Rodriguez of the Western District of Texas, view certain tasks as low risk and appropriate for tool assistance. Rodriguez uses generative models to extract key players, assemble timelines, and generate hearing questions, saying these uses do not supplant judgment and are easy to check. Others and scholars warn the line between safe and unsafe delegation shifts with the task and the judge. Researcher Erin Solovey notes that a plausible-sounding summary or timeline from a model can be factually wrong, and that training and audience differences among models magnify the danger.
Practical guidance has started to emerge. A set of principles published by the Sedona Conference suggested a menu of potentially safe uses while stressing verification and noting that ´no known GenAI tools have fully resolved the hallucination problem.´ Magistrate judge Allison Goddard experiments with ChatGPT, Claude, and other models as a thought partner, and she encourages clerk use of tools that do not train on user conversations. Still, she relies on established legal databases for law-specific tasks and avoids using general models for criminal matters where bias could skew outcomes.
Not all voices are reassured. Judge Scott Schlegel of the fifth circuit warns that judicial reliance on flawed outputs could produce a ´crisis waiting to happen,´ since judges cannot easily rescind law once issued. Recent incidents include a Georgia appellate order that relied on made-up cases, a withdrawn opinion in New Jersey, and a Mississippi judge who reissued a decision without explaining the errors. The debate now centers on governance: how to harness time-saving benefits for narrow, monitored tasks while protecting core judicial decision making and maintaining public confidence in the courts.