Study finds artificial intelligence models can pass CFA level III exam

October 4, 2025

New research from NYU Stern and Goodfin shows leading large language models can clear the CFA level III mock exam, including the essay section. The results surpass 2024 findings that models struggled with constructed-response questions.

A new study from the NYU Stern School of Business and Goodfin reports that today’s leading large language models can pass the CFA level III mock exam, including the essay-based constructed-response portion long considered one of the profession’s most difficult hurdles. The findings build on a 2024 study by J.P. Morgan Artificial Intelligence Research and Queen’s University, which found models could pass level I and II but fell short on level III essays. The latest paper, Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III, benchmarked 23 systems, including Open AI’s GPT-4, Google’s Gemini 2.5 and Anthropic’s Claude Opus 4.

According to the study, Open AI’s o4-mini achieved a composite score of 79.1 percent, while Gemini 2.5 Flash scored 77.3 percent. Most models performed strongly on multiple-choice questions, but only a subset excelled on essays that demand analysis, synthesis and strategic thinking. Researchers also showed that prompting strategies matter: chain-of-thought prompting, which asks a model to reason through its answer, boosted essay accuracy by 15 percentage points. NYU Stern professor Srikanth Jagabathula said recent reasoning-focused models are increasingly able to think through problems and provide explanations for their responses.

To evaluate essays, the team used another large language model as a judge, providing the candidate response, the true response, contextual information and a grading rubric, and then compared those results with a certified human grader. The model-based grader proved stricter than the human, assigning fewer points overall. In response to the findings, Chris Wiese, managing director of education at CFA Institute, emphasized that earning the CFA designation requires more than passing exams, citing 4,000 hours of qualifying work experience, two references, an ethics attestation and completion of practical skills modules. Wiese added that trust, human relationships, ethical judgment and professionalism remain essential, even as the utility of Artificial Intelligence in investment management continues to grow.

Looking ahead, Jagabathula cautioned against assuming models can replace financial professionals. In a small preliminary study, users who sought financial advice from both a model and a human found the model excelled at precise questions with clear answers but struggled to capture unstated context, which affected user trust. For now, he concluded, large language models can meaningfully augment the work of CFA professionals, but whether they can replace them remains uncertain.

Source

55

Impact Score

Latest News

How Artificial Intelligence works and plans to phase out animal testing

November 15, 2025

OpenAI has built an experimental large language model that is easier to interpret, while the UK has announced timelines to end many forms of animal testing backed by new nonanimal technologies.

Nvidia to sell fully integrated Artificial Intelligence servers

November 15, 2025

A report picked up on Tom’s Hardware and discussed on Hacker News says Nvidia is preparing to sell fully built rack and tray assemblies that include Vera CPUs, Rubin GPUs and integrated cooling, moving beyond supplying only GPUs and components for Artificial Intelligence workloads.

Navigating new age verification laws for game developers

November 15, 2025

Governments in the UK, European Union, the United States of America and elsewhere are imposing stricter age verification rules that affect game content, social features and personalization systems. Developers must adopt proportionate age-assurance measures such as ID checks, credit card verification or Artificial Intelligence age estimation to avoid fines, bans and reputational harm.

Large language models require a new form of oversight: capability-based monitoring

November 15, 2025

The paper proposes capability-based monitoring for large language models in healthcare, organizing oversight around shared capabilities such as summarization, reasoning, translation, and safety guardrails. The authors argue this approach is more scalable than task-based monitoring inherited from traditional machine learning and can reveal systemic weaknesses and emergent behaviors across tasks.

NVIDIA moves to in-house server production for Artificial Intelligence

November 15, 2025

NVIDIA will ship finished L10 compute trays as part of its ‘Vera Rubin’ VR200 stack, shifting core server assembly in-house and constraining OEM hardware differentiation for Artificial Intelligence systems. Volume production is slated for late 2026.

Study finds artificial intelligence models can pass CFA level III exam

55

Impact Score

Latest News

How Artificial Intelligence works and plans to phase out animal testing

Nvidia to sell fully integrated Artificial Intelligence servers

Navigating new age verification laws for game developers

Large language models require a new form of oversight: capability-based monitoring

NVIDIA moves to in-house server production for Artificial Intelligence

Contact Us