Study finds artificial intelligence models can pass CFA level III exam

New research from NYU Stern and Goodfin shows leading large language models can clear the CFA level III mock exam, including the essay section. The results surpass 2024 findings that models struggled with constructed-response questions.

A new study from the NYU Stern School of Business and Goodfin reports that today’s leading large language models can pass the CFA level III mock exam, including the essay-based constructed-response portion long considered one of the profession’s most difficult hurdles. The findings build on a 2024 study by J.P. Morgan Artificial Intelligence Research and Queen’s University, which found models could pass level I and II but fell short on level III essays. The latest paper, Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III, benchmarked 23 systems, including Open AI’s GPT-4, Google’s Gemini 2.5 and Anthropic’s Claude Opus 4.

According to the study, Open AI’s o4-mini achieved a composite score of 79.1 percent, while Gemini 2.5 Flash scored 77.3 percent. Most models performed strongly on multiple-choice questions, but only a subset excelled on essays that demand analysis, synthesis and strategic thinking. Researchers also showed that prompting strategies matter: chain-of-thought prompting, which asks a model to reason through its answer, boosted essay accuracy by 15 percentage points. NYU Stern professor Srikanth Jagabathula said recent reasoning-focused models are increasingly able to think through problems and provide explanations for their responses.

To evaluate essays, the team used another large language model as a judge, providing the candidate response, the true response, contextual information and a grading rubric, and then compared those results with a certified human grader. The model-based grader proved stricter than the human, assigning fewer points overall. In response to the findings, Chris Wiese, managing director of education at CFA Institute, emphasized that earning the CFA designation requires more than passing exams, citing 4,000 hours of qualifying work experience, two references, an ethics attestation and completion of practical skills modules. Wiese added that trust, human relationships, ethical judgment and professionalism remain essential, even as the utility of Artificial Intelligence in investment management continues to grow.

Looking ahead, Jagabathula cautioned against assuming models can replace financial professionals. In a small preliminary study, users who sought financial advice from both a model and a human found the model excelled at precise questions with clear answers but struggled to capture unstated context, which affected user trust. For now, he concluded, large language models can meaningfully augment the work of CFA professionals, but whether they can replace them remains uncertain.

55

Impact Score

Memory architecture is central to autonomous llm agents

Memory design, not just model choice, determines whether autonomous agents can sustain context, learn from experience, and stay reliable over time. A practical framework centers on how information is written, managed, and read across multiple memory types.

OpenAI expands cyber model access through trusted program

OpenAI has introduced GPT-5.4-Cyber as a restricted model for cybersecurity professionals, widening access through its Trusted Access for Cyber program. The release highlights both the defensive value and misuse risks of more capable Artificial Intelligence tools in security work.

Chinese tech firms and Li Fei-Fei push world models forward

Chinese tech companies and Li Fei-Fei’s World Labs are accelerating work on world models, a field focused on helping Artificial Intelligence learn from and interact with physical reality. Alibaba’s new Happy Oyster system targets real-time virtual world creation with more continuous user control.

UK launches Sovereign Artificial Intelligence backing for startups

The UK government has unveiled Sovereign Artificial Intelligence, a state-backed initiative aimed at helping domestic startups build, scale and stay in Britain. The first support includes an equity investment in Callosum and supercomputing access for 6 additional companies working across drug discovery, infrastructure and national security.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.