NVIDIA’s Blackwell platform led MLPerf Training 6.0, a peer-reviewed industry benchmark suite for AI training performance, with the fastest time to train on all seven benchmarks. The round added DeepSeek-V3 671B and GPT-OSS-20B mixture-of-experts pretraining workloads, and NVIDIA was the only platform submitted across every benchmark.
Submissions used GB200 NVL72 and GB300 NVL72 rack-scale systems, where fifth-generation NVLink Switches connect all 72 GPUs into a unified pool of compute and memory. GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72 at the same scale, supported by NVFP4, expanded memory capacity and a higher power ceiling.
NVIDIA scaled DeepSeek-V3 671B to 8,192 GPUs using GB200 NVL72 systems, the largest Blackwell-based submission in MLPerf Training to date, and submitted Llama 3.1 405B at 5,120 GPUs. Microsoft Azure reached the Llama 3.1 405B reference quality target in 7.07 minutes on 8,192 GPUs, while CoreWeave reached the DeepSeek-V3 671B target in 2.02 minutes on GB300 NVL72 systems connected with Spectrum-X Ethernet.
The platform’s reliability stack includes 30+ manufacturing test stages, Reliability, Availability and Serviceability Engine monitoring, Spectrum-X Ethernet rerouting in milliseconds and NVIDIA Resiliency Extension for fault detection, recovery and cluster health monitoring. NVIDIA also listed participation from 19 ecosystem organizations, with partner examples including Cohere, Midjourney, Thinking Machines Lab and Higgsfield.
