Huawei CloudMatrix 384 System Surpasses NVIDIA GB200 NVL72 in Total Performance

Huawei´s new CloudMatrix 384 super node delivers higher system-wide performance than NVIDIA´s flagship, redefining the race in Artificial Intelligence hardware at scale.

Huawei has unveiled its CloudMatrix 384 system super node, positioning it as a domestic challenger to NVIDIA´s GB200 NVL72 system in the high-performance Artificial Intelligence hardware arena. The CloudMatrix 384 employs 384 Ascend 910C accelerators, dramatically outscaling NVIDIA´s configuration of 36 Grace CPUs paired with 72 ´Blackwell´ GB200 GPUs. While the solution requires roughly five times more accelerators to nearly double the performance of NVIDIA´s NVL72, it´s a significant step forward for Huawei in system-level deployment, despite lagging in per-chip efficiency and performance.

At the individual accelerator level, NVIDIA maintains clear leadership. Its GB200 GPU delivers over three times the BF16 performance of Huawei´s Ascend 910C (2,500 vs. 780 TeraFLOPS), boasts larger on-chip memory (192 GB compared to 128 GB), and offers superior bandwidth (8 TB/s versus 3.2 TB/s). These specifications translate to raw power and energy efficiency advantages for NVIDIA at the chip scale. However, when the focus shifts to overarching system capabilities, Huawei´s CloudMatrix 384 pulls ahead: it achieves 1.7 times the overall PetaFLOPS, 3.6 times greater HBM memory, and supports more than five times the number of accelerators, allowing broader scalability and bandwidth within a single supercomputer node.

The trade-off for this unprecedented system scalability is energy consumption. Huawei´s solution draws close to four times more power than NVIDIA´s—approximately 560 kW per CloudMatrix 384 system compared to 145 kW for a single GB200 NVL72. This means NVIDIA continues to lead for single-node peak efficiency, but for organizations building massive Artificial Intelligence superclusters where total throughput and interconnect speeds are critical, Huawei´s approach is compelling. The all-to-all topology in Huawei´s design enhances performance for large-scale training and inference tasks. Industry analysts note that as SMIC—the manufacturing partner for Huawei’s chips—advances to newer semiconductor process nodes, future iterations could narrow or even close the current efficiency gap with NVIDIA.

77

Impact Score

Finance officials raise banking security concerns over Anthropic’s mythos model

Anthropic’s Claude Mythos has prompted urgent discussions among finance ministers, central bankers and banks over the risk that advanced cyber capabilities could expose weaknesses in critical financial systems. Governments and financial institutions are being given early access to test and strengthen defences before any broader release.

Uk delays Artificial Intelligence copyright reform

The UK government has postponed immediate copyright reform for Artificial Intelligence, leaving developers, creatives, and rightsholders to operate under existing law. Licensing, transparency, digital replicas, and future litigation are now set to shape the next phase of policy.

Memory architecture is central to autonomous llm agents

Memory design, not just model choice, determines whether autonomous agents can sustain context, learn from experience, and stay reliable over time. A practical framework centers on how information is written, managed, and read across multiple memory types.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.