NVIDIA’s Blackwell platform led MLPerf Training 6.0 across all categories, posting the fastest time to train on every benchmark and the only set of submissions covering all seven workloads. The suite added mixture-of-experts pretraining tests for DeepSeek-V3 671B and GPT-OSS-20B, highlighting the growing role of MoE architectures in frontier model training.
NVIDIA submitted results on GB200 NVL72 and GB300 NVL72 rack-scale systems, where fifth-generation NVLink Switches connect all 72 GPUs into a unified pool of compute and memory. GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72 at the same scale, with gains tied to NVFP4 compute density, expanded memory capacity and a higher power ceiling.
At scale, NVIDIA ran DeepSeek-V3 671B on 8,192 GPUs using GB200 NVL72 systems and submitted Llama 3.1 405B results at 5,120 GPUs. Microsoft Azure reached the Llama 3.1 405B reference quality target in 7.07 minutes on 8,192 GPUs, while CoreWeave hit the DeepSeek-V3 671B target in 2.02 minutes on 8,192-GPU scale using GB300 NVL72 systems and Spectrum-X Ethernet.
Reliability messaging centered on production training jobs that can run for weeks or months across hundreds of thousands of GPUs. NVIDIA described GPU screening across 30+ manufacturing test stages, chip monitoring through its Reliability, Availability and Serviceability Engine, Spectrum-X Ethernet rerouting around failed links in milliseconds, and NVRx checkpoint-based recovery for interrupted nodes.
