DeepSeek Unveils New Method for Scaling Reward Models with SPCT

DeepSeek AI reveals a novel approach to enhance the scalability of general reward models in Artificial Intelligence systems.

DeepSeek AI, a leader in the large language model field, has unveiled a novel technique to enhance the scalability of general reward models (GRMs) during the inference phase. The newly introduced method, documented in their recent research paper, is aimed at optimizing reward generation by dynamically producing principles and critiques, utilizing rejection fine-tuning and rule-based online reinforcement learning.

At a time when the focus on scaling large language models has shifted to the inference phase, DeepSeek´s new method aligns with emerging models like OpenAI’s o1, which prioritize enhanced reinforcement learning during model testing. This reflects a growing trend toward leveraging reinforcement learning to continuously improve model performance by refining reasoning processes and enhancing decision-making capabilities.

DeepSeek´s SPCT approach addresses the challenge of scaling reinforcement learning for large language models by introducing Self-Principled Critique Tuning during inference. This involves rejection fine-tuning and rule-based online reinforcement learning, enhancing both the scalability and quality of GRMs. Experimental results demonstrate the superiority of SPCT over existing methods, setting the stage for further releases, including the anticipated R2 model from DeepSeek.

70

Impact Score

Siemens debuts digital twin composer for industrial metaverse deployments

Siemens has introduced digital twin composer, a software tool that builds industrial metaverse environments at scale by merging comprehensive digital twins with real-time physical data, enabling faster virtual decision making. Early deployments with PepsiCo report higher throughput, shorter design cycles and reduced capital expenditure through physics-accurate simulations and artificial intelligence driven optimization.

Cadence builds chiplet partner ecosystem for physical artificial intelligence and data center designs

Cadence has introduced a Chiplet Spec-to-Packaged Parts ecosystem aimed at simplifying chiplet design for physical artificial intelligence, data center and high performance computing workloads, backed by a roster of intellectual property and foundry partners. The program centers on a physical artificial intelligence chiplet platform and framework that integrates prevalidated components to cut risk and speed commercial deployment.

Patch notes detail split compute and IO tiles in Intel Diamond Rapids Xeon 7

Linux kernel patch notes reveal that Intel’s upcoming Diamond Rapids Xeon 7 server processors separate compute and IO tiles and adopt new performance monitoring and PCIe 6.0 support. The changes point to a more modular architecture and a streamlined product stack focused on 16-channel memory configurations.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.