A new study from the NYU Stern School of Business and Goodfin reports that today’s leading large language models can pass the CFA level III mock exam, including the essay-based constructed-response portion long considered one of the profession’s most difficult hurdles. The findings build on a 2024 study by J.P. Morgan Artificial Intelligence Research and Queen’s University, which found models could pass level I and II but fell short on level III essays. The latest paper, Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III, benchmarked 23 systems, including Open AI’s GPT-4, Google’s Gemini 2.5 and Anthropic’s Claude Opus 4.
According to the study, Open AI’s o4-mini achieved a composite score of 79.1 percent, while Gemini 2.5 Flash scored 77.3 percent. Most models performed strongly on multiple-choice questions, but only a subset excelled on essays that demand analysis, synthesis and strategic thinking. Researchers also showed that prompting strategies matter: chain-of-thought prompting, which asks a model to reason through its answer, boosted essay accuracy by 15 percentage points. NYU Stern professor Srikanth Jagabathula said recent reasoning-focused models are increasingly able to think through problems and provide explanations for their responses.
To evaluate essays, the team used another large language model as a judge, providing the candidate response, the true response, contextual information and a grading rubric, and then compared those results with a certified human grader. The model-based grader proved stricter than the human, assigning fewer points overall. In response to the findings, Chris Wiese, managing director of education at CFA Institute, emphasized that earning the CFA designation requires more than passing exams, citing 4,000 hours of qualifying work experience, two references, an ethics attestation and completion of practical skills modules. Wiese added that trust, human relationships, ethical judgment and professionalism remain essential, even as the utility of Artificial Intelligence in investment management continues to grow.
Looking ahead, Jagabathula cautioned against assuming models can replace financial professionals. In a small preliminary study, users who sought financial advice from both a model and a human found the model excelled at precise questions with clear answers but struggled to capture unstated context, which affected user trust. For now, he concluded, large language models can meaningfully augment the work of CFA professionals, but whether they can replace them remains uncertain.