NVIDIA’s Vera CPU emerged as a strong early contender in server benchmarking, with results indicating it can outperform recent Intel Xeon and AMD EPYC processors in selected data center workloads. The Arm-based platform reflects NVIDIA’s push deeper into custom CPU design and, in the reported tests, ranked ahead of both x86 rivals and NVIDIA’s earlier Grace design.
Vera is equipped with 88 custom Armv9.2 Olympus cores and 176 threads through physical resource partitioning. These custom cores support native FP8 processing, allowing certain Artificial Intelligence workloads to be executed directly on the CPU with a 6×128-bit SVE2 implementation. The chip offers 1.2 TB/s of memory bandwidth and supports up to 1.5 TB of LPDDR5X memory in the SOCAMM2 format. A second-generation Scalable Coherency Fabric provides 3.4 TB/s of bisection bandwidth, connecting the cores across a unified monolithic die and eliminating the latency issues common in chiplet architectures.
The comparison set included single and dual Intel Xeon Granite Rapids 6980P CPUs, along with AMD EPYC Turin and Turin Dense parts such as the AMD EPYC 9755, 9575F, and 9475F. NVIDIA also allowed comparison with its first-generation Grace processor based on Arm Neoverse V2 cores. Testing on the pre-release chip was limited to a specific subset of workloads, including code compilation, stream memory performance, video encoding, Python/Java, and database performance.
In the geometric mean of all test results, NVIDIA’s Vera topped the chart, performing nearly 11% better than AMD’s most advanced designs and about 55.3% better than the best single-socket Intel Xeon. It also outperformed dual-socket configurations, suggesting that some workloads have scaling issues across multiple sockets. These limited results place Vera above any Arm-based design, with a 450 W TDP for the CPU and 50 W for the 768 GB memory pool.
