Mixture of experts powers frontier Artificial Intelligence models, 10x faster on NVIDIA Blackwell NVL72

The top 10 most intelligent open-source models use mixture-of-experts designs to activate only the most relevant experts per token. NVIDIA GB200 NVL72 delivers a 10x performance and performance-per-watt leap for models such as Kimi K2 Thinking, DeepSeek-R1 and Mistral Large 3.

A mixture-of-experts, or MoE, architecture is now the dominant pattern behind leading frontier Artificial Intelligence models because it routes each token to a small set of specialized experts rather than using all model parameters. The article cites the Artificial Analysis leaderboard, where the top 10 most intelligent open-source models adopt MoE designs, including DeepSeek Artificial Intelligence’s DeepSeek-R1, Moonshot Artificial Intelligence’s Kimi K2 Thinking, OpenArtificial Intelligence’s gpt-oss-120B and Mistral Artificial Intelligence’s Mistral Large 3. By selecting only the experts relevant to a given Artificial Intelligence token, MoE models raise intelligence and adaptability while containing compute and energy costs relative to dense models that use every parameter for every token.

Scaling MoE in production has been constrained by memory pressure and latency caused by distributing experts across multiple GPUs. NVIDIA’s answer is an extreme codesign in the GB200 NVL72 rack-scale system, which integrates 72 NVIDIA Blackwell GPUs into a single NVLink fabric with 130 TB/s of NVLink connectivity, 30TB of fast shared memory and 1.4 exaflops of Artificial Intelligence performance. That design lets expert parallelism span up to 72 GPUs, reducing experts per GPU, easing parameter-loading demands on high-bandwidth memory and accelerating all-to-all expert communication. Software and format optimizations including NVIDIA Dynamo, NVFP4 and support from TensorRT-LLM, SGLang and vLLM help orchestrate disaggregated serving, prefill and decode tasks to maximize inference throughput and efficiency.

NVIDIA reports a 10x generational leap in performance per watt on GB200 NVL72 for multiple MoE models compared with prior-generation platforms such as NVIDIA HGX H200. Kimi K2 Thinking, DeepSeek-R1 and Mistral Large 3 are cited as examples of this improvement. Cloud providers and partners are deploying GB200 NVL72, and customers including CoreWeave, DeepL and Fireworks Artificial Intelligence are using the rack-scale design to run and serve large MoE models. The article positions MoE as a fundamental architecture for future multimodal and agentic systems and presents GB200 NVL72 as the infrastructure enabling wide expert parallelism and materially lower per-token cost and power consumption for frontier Artificial Intelligence workloads.

70

Impact Score

Apple faces delays and internal crisis over major Siri overhaul

Apple is reportedly struggling to deliver a fully overhauled Siri powered by large language models, with the most ambitious version now not expected until at least iOS 20, raising concerns about its Artificial Intelligence strategy and competitiveness.

Artificial intelligence maps data driven strategies to fight cancer

A new artificial intelligence powered analysis is identifying the most effective levers for cancer prevention, early detection, and treatment, with a focus on equity and global collaboration. The data driven roadmap is already influencing policy discussions and could reshape long term cancer control strategies.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.