Mixture of experts powers frontier Artificial Intelligence models, 10x faster on NVIDIA Blackwell NVL72

The top 10 most intelligent open-source models use mixture-of-experts designs to activate only the most relevant experts per token. NVIDIA GB200 NVL72 delivers a 10x performance and performance-per-watt leap for models such as Kimi K2 Thinking, DeepSeek-R1 and Mistral Large 3.

A mixture-of-experts, or MoE, architecture is now the dominant pattern behind leading frontier Artificial Intelligence models because it routes each token to a small set of specialized experts rather than using all model parameters. The article cites the Artificial Analysis leaderboard, where the top 10 most intelligent open-source models adopt MoE designs, including DeepSeek Artificial Intelligence’s DeepSeek-R1, Moonshot Artificial Intelligence’s Kimi K2 Thinking, OpenArtificial Intelligence’s gpt-oss-120B and Mistral Artificial Intelligence’s Mistral Large 3. By selecting only the experts relevant to a given Artificial Intelligence token, MoE models raise intelligence and adaptability while containing compute and energy costs relative to dense models that use every parameter for every token.

Scaling MoE in production has been constrained by memory pressure and latency caused by distributing experts across multiple GPUs. NVIDIA’s answer is an extreme codesign in the GB200 NVL72 rack-scale system, which integrates 72 NVIDIA Blackwell GPUs into a single NVLink fabric with 130 TB/s of NVLink connectivity, 30TB of fast shared memory and 1.4 exaflops of Artificial Intelligence performance. That design lets expert parallelism span up to 72 GPUs, reducing experts per GPU, easing parameter-loading demands on high-bandwidth memory and accelerating all-to-all expert communication. Software and format optimizations including NVIDIA Dynamo, NVFP4 and support from TensorRT-LLM, SGLang and vLLM help orchestrate disaggregated serving, prefill and decode tasks to maximize inference throughput and efficiency.

NVIDIA reports a 10x generational leap in performance per watt on GB200 NVL72 for multiple MoE models compared with prior-generation platforms such as NVIDIA HGX H200. Kimi K2 Thinking, DeepSeek-R1 and Mistral Large 3 are cited as examples of this improvement. Cloud providers and partners are deploying GB200 NVL72, and customers including CoreWeave, DeepL and Fireworks Artificial Intelligence are using the rack-scale design to run and serve large MoE models. The article positions MoE as a fundamental architecture for future multimodal and agentic systems and presents GB200 NVL72 as the infrastructure enabling wide expert parallelism and materially lower per-token cost and power consumption for frontier Artificial Intelligence workloads.

70

Impact Score

OpenAI trains LLM to confess to bad behavior

OpenAI is experimenting with model “confessions” that describe how a large language model carried out a task and admit when it lied or cheated. The technique is intended to make systems more trustworthy as they are deployed in Artificial Intelligence applications.

The current state of Artificial Intelligence in science

Artificial Intelligence is reshaping biological research and other scientific fields through foundation models and integrated platforms, with Google DeepMind’s AlphaFold and WeatherNext among the technologies driving rapid application and industrialization.

Nvidia: Latest news and insights

A running roundup of Nvidia’s products, partnerships and controversies shaping enterprise Artificial Intelligence through Dec 3, 2025.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.