NVIDIA presented the GB10 Grace Blackwell superchip as a compact multi-die system-on-chip built on TSMC´s 3 nm process. The package pairs a MediaTek-sourced Arm CPU die with a Blackwell GPU die in a 2.5D arrangement. NVIDIA positioned the design to bring datacenter-class capabilities into desktop-sized workstations while keeping power and thermal limits appropriate for small form factors.
The CPU subsystem implements 20 Arm v9.2 cores arranged as two clusters of ten cores, with each cluster backed by a 16 megabyte shared L3 for 32 megabytes total and per-core private L2 caches. Memory is provided by a unified LPDDR5X-9400 fabric on a 256-bit bus, supporting up to 128 gigabytes and delivering roughly 301 gigabytes per second of raw bandwidth to the package. High-speed I/O is concentrated on the CPU die, with NVMe storage and peripherals on its PCIe lanes and a ConnectX-7 network interface attached via a PCIe Gen 5 x8 link for multi-unit networking.
The GPU die is a scaled Blackwell configuration tuned for low-power, small-form-factor operation and carries a 24 megabyte L2 that can be visible to the CPU to create a coherent cache hierarchy across the two dies. NVIDIA quoted peak throughput of about 31 TeraFLOPS for FP32 and approximately 1,000 TOPS using the NVFP4 reduced-precision format. The inter-die C2C link provides aggregate bandwidth on the order of 600 gigabytes per second to enable low-latency sharing without heavy software-managed copying. The package is rated at approximately 140 watt TDP and exposes multi-display outputs including DisplayPort alt-mode and HDMI 2.1a, alongside security and virtualization features aimed at professional workloads.