Judges experiment with Artificial Intelligence amid hallucination risks

Judges across the US are using Artificial Intelligence to speed research and draft routine orders, but hallucinations in generated text have forced reissued rulings and prompted calls for clearer boundaries.

The recent wave of mistakes in court filings has exposed how fallible both generative tools and human review can be. Lawyers at prominent firms submitted motions citing cases that did not exist, a Stanford professor filed sworn testimony containing hallucinations, and judges have been forced to reprimand parties or withdraw orders after errors surfaced. At the same time, many on the bench are trying out Artificial Intelligence themselves, hoping to unclog backlogs by delegating rote work such as summarizing materials, creating timelines, and drafting routine orders.

Those early experiments have produced a split portrait. Some judges, like Xavier Rodriguez of the Western District of Texas, view certain tasks as low risk and appropriate for tool assistance. Rodriguez uses generative models to extract key players, assemble timelines, and generate hearing questions, saying these uses do not supplant judgment and are easy to check. Others and scholars warn the line between safe and unsafe delegation shifts with the task and the judge. Researcher Erin Solovey notes that a plausible-sounding summary or timeline from a model can be factually wrong, and that training and audience differences among models magnify the danger.

Practical guidance has started to emerge. A set of principles published by the Sedona Conference suggested a menu of potentially safe uses while stressing verification and noting that ´no known GenAI tools have fully resolved the hallucination problem.´ Magistrate judge Allison Goddard experiments with ChatGPT, Claude, and other models as a thought partner, and she encourages clerk use of tools that do not train on user conversations. Still, she relies on established legal databases for law-specific tasks and avoids using general models for criminal matters where bias could skew outcomes.

Not all voices are reassured. Judge Scott Schlegel of the fifth circuit warns that judicial reliance on flawed outputs could produce a ´crisis waiting to happen,´ since judges cannot easily rescind law once issued. Recent incidents include a Georgia appellate order that relied on made-up cases, a withdrawn opinion in New Jersey, and a Mississippi judge who reissued a decision without explaining the errors. The debate now centers on governance: how to harness time-saving benefits for narrow, monitored tasks while protecting core judicial decision making and maintaining public confidence in the courts.

78

Impact Score

European Union delays key Artificial Intelligence Act obligations

European Union lawmakers have agreed to revise the Artificial Intelligence Act, delaying major high-risk compliance obligations and easing some overlapping requirements. The changes give businesses more time to prepare while preserving the law’s core framework for high-risk systems and transparency rules.

HMRC signs £175m Quantexa deal for fraud detection

HM Revenue and Customs has signed a £175 million, 10-year agreement with Quantexa to unify fragmented data and strengthen fraud detection. The deployment is designed to automate routine work while keeping decisions transparent, auditable and subject to human approval.

Us supercomputers test new Artificial Intelligence chip suppliers

Sandia National Laboratories is evaluating chips from Israeli startup NextSilicon as major chipmakers shift their roadmaps toward Artificial Intelligence. The move reflects growing concern that mainstream processors are deprioritizing the scientific computing features government labs still need.

EU Artificial Intelligence Act amendments delay some deadlines and add new bans

A provisional Digital Omnibus on Artificial Intelligence would push back several EU Artificial Intelligence Act deadlines, refine how the law interacts with sector rules, and introduce new prohibited practices. The package also expands limited bias-testing allowances and strengthens centralized oversight for some high-impact systems.

Qwen 3.5 raises concerns about censorship embedded in model weights

A technical analysis of Alibaba Cloud’s Qwen 3.5 points to political censorship circuits embedded directly in the model’s learned weights. The findings highlight operational, compliance, and product risks for startups building on third-party Artificial Intelligence models.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.