DeepSeek-Prover-V2 Sets New Benchmarks in Neural Theorem Proving

DeepSeek-Prover-V2 debuts as an open-source large language model, driving advances in formal theorem proving and offering a new rigorous benchmark for mathematical reasoning in Artificial Intelligence.

DeepSeek AI has introduced DeepSeek-Prover-V2, an open-source large language model tailored for formal theorem proving within the Lean 4 system. This new model pioneers a recursive theorem-proving approach, utilizing DeepSeek-V3 to autonomously generate initialization data and thus achieve superior training efficiency. DeepSeek-Prover-V2 integrates both informal human-like reasoning and strict formal proofs, empowering it to excel in neural theorem proving tasks and establishing new performance milestones.

A central feature of DeepSeek-Prover-V2 is its cold-start data generation process. By leveraging DeepSeek-V3 to break down complex theorems into subgoals and to formalize these steps in Lean 4, researchers create synthetic datasets that combine high-level reasoning with detailed proof formalization. These decomposed proof steps are further processed by a specialized 7B parameter model capable of navigating computationally demanding proof searches. The resulting dataset enables the model to initiate effective reinforcement learning cycles, greatly improving its ability to tackle both familiar and novel mathematical challenges.

Following its data-driven training phase, DeepSeek-Prover-V2 undergoes reinforcement learning using correct-or-incorrect feedback, bridging informal intuitive reasoning with rigorous formal proof construction. The flagship version, with 671 billion parameters, achieves an 88.9% pass rate on the MiniF2F-test and successfully solves 49 out of 658 problems from PutnamBench, marking clear state-of-the-art advances in neural theorem proving. All MiniF2F-generated proofs are available for public review and analysis, supporting transparency and further research.

In tandem with its model release, DeepSeek AI unveiled ProverBench, a new benchmark dataset containing 325 formalized math problems spanning competition-level questions and textbook examples across diverse mathematical domains. Notably, ProverBench incorporates recent American Invitational Mathematics Examination (AIME) problems alongside a curated selection of tutorial and textbook material, providing a comprehensive platform for evaluating model performance on both advanced and foundational mathematics. DeepSeek-Prover-V2 is available in scalable options, including 7B and 671B parameter versions, to address varying computational needs, with expanded context lengths supporting more intricate proofs. This release signifies a pivotal advance in formal mathematics and neural theorem proving research.

82

Impact Score

IBM and AMD partner on quantum-centric supercomputing

IBM and AMD announced plans to develop quantum-centric supercomputing architectures that combine quantum computers with high-performance computing to create scalable, open-source platforms. The collaboration leverages IBM´s work on quantum computers and software and AMD´s expertise in high-performance computing and Artificial Intelligence accelerators.

Qualcomm launches Dragonwing Q-6690 with integrated RFID and Artificial Intelligence

Qualcomm announced the Dragonwing Q-6690, billed as the world’s first enterprise mobile processor with fully integrated UHF RFID and built-in 5G, Wi-Fi 7, Bluetooth 6.0, ultra-wideband and Artificial Intelligence capabilities. The platform is aimed at rugged handhelds, point-of-sale systems and smart kiosks and offers software-configurable feature packs that can be upgraded over the air.

Recent books from the MIT community

A roundup of new titles from the MIT community, including Empire of Artificial Intelligence, a critical look at Sam Altman’s OpenAI, and Data, Systems, and Society, a textbook on harnessing Artificial Intelligence for societal good.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.