A look under the hood of many frontier models shows a preference for mixture-of-experts architectures. The article explains that mixture-of-experts models mimic the human brain by activating specific experts for each token, which reduces compute requirements while increasing token-generation efficiency. On the independent Artificial Analysis (AA) leaderboard, the top 10 most intelligent open-source models use an MoE architecture, and the article lists DeepSeek Artificial Intelligence’s DeepSeek-R1, Moonshot Artificial Intelligence’s Kimi K2 Thinking, OpenArtificial Intelligence’s gpt-oss-120B and Mistral Artificial Intelligence’s Mistral Large 3 as examples among that top group.
Scaling mixture-of-experts models in production is described as notoriously difficult because achieving both high efficiency and high performance requires close coordination between hardware and software. The article highlights NVIDIA’s GB200 NVL72 rack-scale systems as an extreme codesign that combines hardware and software optimizations to make MoE scaling practical. It reports a specific performance comparison: the Kimi K2 Thinking MoE model, ranked as the most intelligent open-source model on the AA leaderboard, sees a 10x performance leap on the NVIDIA GB200 NVL72 rack-scale system compared with NVIDIA HGX H200. The piece frames that gain as a demonstration of how system-level engineering can unlock the efficiency benefits of MoE architectures.
Finally, the article ties the NVL72 results to other MoE deployments, saying the breakthrough builds on performance delivered for DeepSeek-R1 and Mistral Large 3 MoE models. It concludes that mixture-of-experts is becoming the architecture of choice for frontier models and positions NVIDIA’s full-stack inference platform as key to realizing the architecture’s potential in production environments.
