DeepSeek Unveils New Method for Scaling Reward Models with SPCT

DeepSeek AI reveals a novel approach to enhance the scalability of general reward models in Artificial Intelligence systems.

DeepSeek AI, a leader in the large language model field, has unveiled a novel technique to enhance the scalability of general reward models (GRMs) during the inference phase. The newly introduced method, documented in their recent research paper, is aimed at optimizing reward generation by dynamically producing principles and critiques, utilizing rejection fine-tuning and rule-based online reinforcement learning.

At a time when the focus on scaling large language models has shifted to the inference phase, DeepSeek´s new method aligns with emerging models like OpenAI’s o1, which prioritize enhanced reinforcement learning during model testing. This reflects a growing trend toward leveraging reinforcement learning to continuously improve model performance by refining reasoning processes and enhancing decision-making capabilities.

DeepSeek´s SPCT approach addresses the challenge of scaling reinforcement learning for large language models by introducing Self-Principled Critique Tuning during inference. This involves rejection fine-tuning and rule-based online reinforcement learning, enhancing both the scalability and quality of GRMs. Experimental results demonstrate the superiority of SPCT over existing methods, setting the stage for further releases, including the anticipated R2 model from DeepSeek.

70

Impact Score

Tesla plans terafab for Artificial Intelligence chips

Tesla is moving toward a large-scale chip manufacturing project to support its autonomous driving roadmap. Elon Musk said the terafab effort for Artificial Intelligence chips will launch in seven days and may involve Intel, TSMC and Samsung.

Timeline traces evolution, civilisation and planetary stewardship

A sweeping chronology links cosmology, evolution, human history and modern environmental risk in a single long view of the human condition. The sequence culminates in contemporary debates over climate change, biodiversity loss and artificial intelligence governance.

Wolters Kluwer report tracks Artificial Intelligence shift in legal work

Wolters Kluwer’s 2026 Future Ready Lawyer findings show Artificial Intelligence has become a foundational tool across law firms and corporate legal departments. The survey points to measurable time savings, revenue growth, and rising pressure to strengthen training, ethics, and security.

Anthropic March 2026 release roundup

Anthropic rolled out a broad set of March 2026 updates across Claude Code, the Claude Developer Platform, Claude apps, and enterprise partnerships. Changes focused on larger context windows, workflow improvements, reliability fixes, visual output features, and new partner enablement programs.

China renews push to lead in technology and Artificial Intelligence

China’s 15th five-year plan elevates science and technology as core national priorities, with a strong emphasis on self-reliance and Artificial Intelligence. The blueprint signals heavier investment, broader industrial support, and a more confident bid to shape global technology standards.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.