DeepSeek Unveils New Method for Scaling Reward Models with SPCT

DeepSeek AI reveals a novel approach to enhance the scalability of general reward models in Artificial Intelligence systems.

DeepSeek AI, a leader in the large language model field, has unveiled a novel technique to enhance the scalability of general reward models (GRMs) during the inference phase. The newly introduced method, documented in their recent research paper, is aimed at optimizing reward generation by dynamically producing principles and critiques, utilizing rejection fine-tuning and rule-based online reinforcement learning.

At a time when the focus on scaling large language models has shifted to the inference phase, DeepSeek´s new method aligns with emerging models like OpenAI’s o1, which prioritize enhanced reinforcement learning during model testing. This reflects a growing trend toward leveraging reinforcement learning to continuously improve model performance by refining reasoning processes and enhancing decision-making capabilities.

DeepSeek´s SPCT approach addresses the challenge of scaling reinforcement learning for large language models by introducing Self-Principled Critique Tuning during inference. This involves rejection fine-tuning and rule-based online reinforcement learning, enhancing both the scalability and quality of GRMs. Experimental results demonstrate the superiority of SPCT over existing methods, setting the stage for further releases, including the anticipated R2 model from DeepSeek.

70

Impact Score

Google expands Gemini for Science

Google is rolling out Gemini for Science, a set of experimental tools aimed at compressing scientific work that would typically take months or years into days. The effort combines multi-agent research systems, computational discovery tools, literature analysis, and database-connected life science assistants.

Europe weighs technology sovereignty push amid internal debate

Europe is preparing a new policy push to reduce reliance on major technology platforms, but internal disagreements are shaping the scope and pace of the effort. The Artificial Intelligence Development Act is due to be unveiled on June 3 after repeated delays.

EU Artificial Intelligence Act omnibus deal delays high-risk rules

A provisional EU agreement would push back key high-risk Artificial Intelligence Act deadlines while keeping major transparency duties on track for 2 August 2026. The deal also adds a new ban on non-consensual intimate imagery and child sexual abuse material generated by Artificial Intelligence systems.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.