The landscape of mathematics is experiencing a rapid transformation as artificial intelligence becomes increasingly capable at tackling complex problems. The US Defense Advanced Research Projects Agency (DARPA) has initiated the expMath program to revolutionize mathematical research and turbocharge the traditionally slow pace of mathematical breakthroughs. Their vision centers on the creation of an artificial intelligence ´coauthor,´ a tool that can decompose monumental math problems into manageable, solvable parts. The hope is that artificial intelligence will not just assist with routine calculations but help unlock discoveries previously deemed unreachable.
Recent years have seen large reasoning models (LRMs) such as OpenAI’s o3 and Anthropic’s Claude 4 Thinking set new benchmarks by solving high-level math problems, including those found on the American Invitational Mathematics Examination (AIME) and the International Math Olympiad. Hybrid models like AlphaProof—developed by Google DeepMind—combine language models with advanced game-playing systems, achieving feats previously reserved for top human competitors. AlphaEvolve, another DeepMind creation, has even outperformed humans on over 50 unsolved math and computer science problems. However, these successes largely draw from the repetitive nature and recognizable tricks in competition problems, which differ vastly from the exploratory and open-ended challenges encountered in mathematical research.
This distinction has prompted the development of new benchmarks like Epoch AI’s FrontierMath, designed in collaboration with mathematicians to push artificial intelligence further by introducing entirely novel problems that demand hours of expert-level reasoning. While leading language models achieve close to perfect scores on standardized tests, they still struggle to surpass 20% on these new, domain-driven challenges, exposing current technological limits. Researchers like Sergei Gukov at Caltech have begun innovating with approaches that condense sequences of mathematical reasoning into ‘supermoves,’ allowing reinforcement-learning systems to check entire attack directions on long-standing conjectures such as the Andrews-Curtis problem, thereby saving years of human effort.
Yet, a key question persists: can artificial intelligence deliver genuine mathematical insight, or does it remain a sophisticated assistant? Advanced tools like AlphaEvolve and Meta’s PatternBoost support human exploration by rapidly generating and evaluating ideas—functioning as a creative brainstorming partner. Mathematicians such as Geordie Williamson envision a future where artificial intelligence helps unearth mathematical objects that have the potential to shape the discipline, but emphasize that intuition and conceptual breakthroughs, like inventing the icosahedron, remain uniquely human traits. Ultimately, artificial intelligence is poised as an invaluable scout and collaborator, accelerating progress in mathematics—but the core of true discovery still lies with human curiosity and ingenuity.