Nvidia has introduced its vera cpus as standalone system on chips aimed at third party customers, positioning them as competitors to intel’s xeon and amd’s epyc server processors. Although these processors are expected to function as general purpose cpus that can interface with a wide range of accelerators, the vera generation appears to be tuned specifically for nvidia gpus. The system on chip contains a hardware bug that triggers errors when it is paired with non nvidia graphics cards or accelerators, raising concerns about reliable operation and system installation in heterogeneous environments.
The source of the problem lies in how the vera pcie controllers generate memory addresses. Under certain conditions, they produce invalid addresses that disrupt reliable communication with third party accelerators. This issue occurs during pcie memory mapped i o write operations when the cpu attempts to write with partial byte enable to mmio regions. The problem becomes more severe when these regions are mapped using arm’s normal non cacheable memory attribute ‘mt_normal_nc,’ which creates significant compatibility challenges for devices that rely on precise memory ordering and addressing.
Because arm uses more relaxed memory ordering for normal non cacheable attributes, this behavior can trigger the erratum in vera, which leads to erroneous address generation, data corruption, and potential pcie device failure. These failures are particularly likely during dma intensive workloads such as artificial intelligence training or large scale hpc simulations, where accelerators perform frequent and heavy data transfers. Nvidia gpus are co designed with vera cpus and their specific memory ordering model, so no such issues have been reported when vera is used in combination with nvidia’s own accelerators, highlighting a sharp contrast with the experience on amd gpus and other third party hardware.
