Legal artificial intelligence in the large language model era reshapes data, modeling and evaluation

Large language models are transforming legal artificial intelligence from narrow task solvers into core components of data pipelines, modeling frameworks and evaluation workflows across key legal tasks.

Legal artificial intelligence is entering a new phase as large-scale language models become central to how legal data is processed, how models are built and how systems are evaluated. Legal artificial intelligence is defined as the use of artificial intelligence technologies to automate various legal tasks, and recent advances in large language models have significantly enhanced its capabilities. Large language models no longer only improve accuracy on benchmark tasks but now assume multiple roles across the lifecycle of legal natural language processing, from data construction and augmentation to inference-time reasoning and post hoc assessment of system behavior.

The survey introduces a role-based schema that categorizes the involvement of large language models along three main perspectives: data, modeling and evaluation. On the data side, large language models are used to generate synthetic legal questions, summaries, rationales and annotations, to assist in data cleaning, to support data augmentation, and to help structure unorganized legal text into machine-usable formats, thereby addressing the scarcity and imbalance of labeled legal corpora. On the modeling side, large language models function as task solvers through prompting and fine-tuning, as reasoning engines that incorporate legal principles such as syllogistic or deontic logic, as components in hybrid neuro-symbolic or retrieval-augmented architectures, and as collaborative partners to smaller domain models. On the evaluation side, large language models are enlisted to provide relevance judgments in retrieval, to annotate legal factors or argument structures, to assess hallucinations and factuality in generated content, and to enable more nuanced qualitative analyses of legal system performance.

Using this schema, the survey systematically reviews work in three major categories of legal tasks: legal classification, legal retrieval and legal generation. In legal classification, large language models support judgment prediction, topic and rule classification, contract clause analysis and multi-label document tagging, often through prompt learning or as part of multi-stage frameworks. In legal retrieval, they enhance case law and statute retrieval by reasoning over implicit concepts, collaborating with graph-based methods, and serving as judges or rerankers in competitions such as COLIEE. In legal generation, they power abstractive summarization, question answering, judgment and court view generation, regulatory summarization and simulation of courtroom debates. A detailed quantitative comparison of effectiveness across roles and tasks indicates that the impact of large language models depends strongly on their assigned role and the characteristics of the underlying legal task, highlighting both the promise and the limitations of current approaches and pointing to open challenges in robustness, interpretability, ethics and domain adaptation.

60

Impact Score

Policymakers weigh pause on Artificial Intelligence data center construction

Federal, state, and local officials are moving to slow or condition large data center development as concerns grow over electricity costs, grid strain, environmental effects, and labor standards. Proposed moratoriums and tax incentive changes are creating new uncertainty for developers, hyperscalers, and financiers.

European Union delays key Artificial Intelligence Act obligations

European Union lawmakers have agreed to revise the Artificial Intelligence Act, delaying major high-risk compliance obligations and easing some overlapping requirements. The changes give businesses more time to prepare while preserving the law’s core framework for high-risk systems and transparency rules.

HMRC signs £175m Quantexa deal for fraud detection

HM Revenue and Customs has signed a £175 million, 10-year agreement with Quantexa to unify fragmented data and strengthen fraud detection. The deployment is designed to automate routine work while keeping decisions transparent, auditable and subject to human approval.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.