DeepSeek AI has introduced DeepSeek-Prover-V2, an open-source large language model tailored for formal theorem proving within the Lean 4 system. This new model pioneers a recursive theorem-proving approach, utilizing DeepSeek-V3 to autonomously generate initialization data and thus achieve superior training efficiency. DeepSeek-Prover-V2 integrates both informal human-like reasoning and strict formal proofs, empowering it to excel in neural theorem proving tasks and establishing new performance milestones.
A central feature of DeepSeek-Prover-V2 is its cold-start data generation process. By leveraging DeepSeek-V3 to break down complex theorems into subgoals and to formalize these steps in Lean 4, researchers create synthetic datasets that combine high-level reasoning with detailed proof formalization. These decomposed proof steps are further processed by a specialized 7B parameter model capable of navigating computationally demanding proof searches. The resulting dataset enables the model to initiate effective reinforcement learning cycles, greatly improving its ability to tackle both familiar and novel mathematical challenges.
Following its data-driven training phase, DeepSeek-Prover-V2 undergoes reinforcement learning using correct-or-incorrect feedback, bridging informal intuitive reasoning with rigorous formal proof construction. The flagship version, with 671 billion parameters, achieves an 88.9% pass rate on the MiniF2F-test and successfully solves 49 out of 658 problems from PutnamBench, marking clear state-of-the-art advances in neural theorem proving. All MiniF2F-generated proofs are available for public review and analysis, supporting transparency and further research.
In tandem with its model release, DeepSeek AI unveiled ProverBench, a new benchmark dataset containing 325 formalized math problems spanning competition-level questions and textbook examples across diverse mathematical domains. Notably, ProverBench incorporates recent American Invitational Mathematics Examination (AIME) problems alongside a curated selection of tutorial and textbook material, providing a comprehensive platform for evaluating model performance on both advanced and foundational mathematics. DeepSeek-Prover-V2 is available in scalable options, including 7B and 671B parameter versions, to address varying computational needs, with expanded context lengths supporting more intricate proofs. This release signifies a pivotal advance in formal mathematics and neural theorem proving research.