Legal artificial intelligence in the large language model era reshapes data, modeling and evaluation

Large language models are transforming legal artificial intelligence from narrow task solvers into core components of data pipelines, modeling frameworks and evaluation workflows across key legal tasks.

Legal artificial intelligence is entering a new phase as large-scale language models become central to how legal data is processed, how models are built and how systems are evaluated. Legal artificial intelligence is defined as the use of artificial intelligence technologies to automate various legal tasks, and recent advances in large language models have significantly enhanced its capabilities. Large language models no longer only improve accuracy on benchmark tasks but now assume multiple roles across the lifecycle of legal natural language processing, from data construction and augmentation to inference-time reasoning and post hoc assessment of system behavior.

The survey introduces a role-based schema that categorizes the involvement of large language models along three main perspectives: data, modeling and evaluation. On the data side, large language models are used to generate synthetic legal questions, summaries, rationales and annotations, to assist in data cleaning, to support data augmentation, and to help structure unorganized legal text into machine-usable formats, thereby addressing the scarcity and imbalance of labeled legal corpora. On the modeling side, large language models function as task solvers through prompting and fine-tuning, as reasoning engines that incorporate legal principles such as syllogistic or deontic logic, as components in hybrid neuro-symbolic or retrieval-augmented architectures, and as collaborative partners to smaller domain models. On the evaluation side, large language models are enlisted to provide relevance judgments in retrieval, to annotate legal factors or argument structures, to assess hallucinations and factuality in generated content, and to enable more nuanced qualitative analyses of legal system performance.

Using this schema, the survey systematically reviews work in three major categories of legal tasks: legal classification, legal retrieval and legal generation. In legal classification, large language models support judgment prediction, topic and rule classification, contract clause analysis and multi-label document tagging, often through prompt learning or as part of multi-stage frameworks. In legal retrieval, they enhance case law and statute retrieval by reasoning over implicit concepts, collaborating with graph-based methods, and serving as judges or rerankers in competitions such as COLIEE. In legal generation, they power abstractive summarization, question answering, judgment and court view generation, regulatory summarization and simulation of courtroom debates. A detailed quantitative comparison of effectiveness across roles and tasks indicates that the impact of large language models depends strongly on their assigned role and the characteristics of the underlying legal task, highlighting both the promise and the limitations of current approaches and pointing to open challenges in robustness, interpretability, ethics and domain adaptation.

60

Impact Score

MiniMax 2.5 local deployment and performance guide

MiniMax 2.5 is a large open language model tuned for coding, tool use, search and office workflows, with quantized variants designed to run on high memory desktops and workstations using llama.cpp and OpenAI compatible APIs.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.