NVIDIA sweeps MLPerf Training v5.1 benchmarks

NVIDIA posted the fastest times across all seven MLPerf Training v5.1 tests, demonstrating new gains in Artificial Intelligence training from GPUs to networking and software. The company submitted results on every benchmark and introduced Blackwell Ultra and NVFP4-backed optimizations.

NVIDIA swept all seven tests in MLPerf Training v5.1, submitting the only platform results on every benchmark and underscoring the programmability of NVIDIA GPUs and the maturity of the CUDA software stack. The company reported top performance across large language models, image generation, recommender systems, computer vision and graph neural networks, and said its partners provided a wide set of system submissions.

The GB300 NVL72 rack-scale system, powered by the NVIDIA Blackwell Ultra GPU architecture, made its MLPerf debut. NVIDIA reported more than 4x the Llama 3.1 405B pretraining performance and nearly 5x the Llama 2 70B LoRA fine-tuning performance versus the prior-generation Hopper architecture using the same GPU count. Blackwell Ultra includes new Tensor Cores that deliver 15 petaflops of NVFP4 Artificial Intelligence compute, twice the attention-layer compute, and 279 GB of HBM3e memory. The Quantum-X800 InfiniBand platform, an end-to-end 800 Gb/s networking solution, also debuted and doubled scale-out networking bandwidth compared with the prior generation.

A central technical advance in this round was the use of NVFP4 precision for training. NVIDIA said Blackwell GPUs can perform FP4 calculations, including the NVFP4 format and other FP4 variants, at double the rate of FP8 and that Blackwell Ultra boosts that to three times FP8 performance. The company was the only submitter to use FP4 calculations while meeting MLPerf Training accuracy requirements. Those optimizations helped set new records including a 10 minute time to train Llama 3.1 405B using more than 5,000 Blackwell GPUs, and a 18.79 minute run using 2,560 Blackwell GPUs that was 45 percent faster than the prior Blackwell-based submission with 2,496 GPUs.

NVIDIA also set records on two new benchmarks added this round. Llama 3.1 8B replaced BERT-large and NVIDIA reported a 5.2 minute training time using up to 512 Blackwell Ultra GPUs. FLUX.1 replaced Stable Diffusion v2 and NVIDIA submitted a 12.5 minute result using 1,152 Blackwell GPUs. The company said it continued to hold records on existing graph neural network, object detection and recommender system tests, and highlighted participation from 15 ecosystem organizations including Dell Technologies, Hewlett Packard Enterprise, Lenovo, Supermicro and Lambda. NVIDIA described its cadence of annual innovation as driving rapid performance increases across pretraining, post-training and inference.

68

Impact Score

Artificial intelligence news: latest headlines

AIbase aggregates breaking Artificial intelligence headlines and product updates across models, infrastructure, voice and compute. The site lists recent items ranging from voice and multimodal model launches to chip and data center investments.

When Artificial Intelligence meets biology

Microsoft researchers disclosed a confidential study that explored how open-source Artificial Intelligence tools could bypass biosecurity checks and said their work helped produce fixes now influencing global standards.

more Artificial Intelligence-resilient biosecurity with the Paraphrase Project

Microsoft researcher Eric Horvitz and collaborators discuss the Paraphrase Project, a red-teaming effort that exposed and helped secure a biosecurity vulnerability in Artificial Intelligence-driven protein design. The episode frames the work as a practical model for mitigating dual-use risks in Artificial Intelligence applications.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.